Predictive AI kidney disease models in forecasting end-stage
Imagine a world where routine doctor visits primarily consist of rapid, comprehensive screenings. Thanks to such advancements, we can prevent many diseases before they significantly impact our bodies. This vision is becoming increasingly possible as medicine and information technology collaborate to improve people’s lives. Especially when it comes to medical algorithm development. One particular area I’d like to describe today is predictive AI in kidney diseases.
Chronic kidney disease (CKD) affects billions of people worldwide, putting a strain on healthcare systems and affecting mortality rates. As CKD progresses to end-stage kidney disease (ESKD), patients require kidney replacement therapy (KRT). Early intervention in high-risk CKD patients can delay progression, improve quality of life, and reduce the burden of costs and complications associated with KRT.
In the age of AI development in medicine, we can do studies to improve people’s life expectancy. One of these kinds of research I would like to describe in this post. It mainly focuses on developing five machine learning models to predict end-stage kidney disease in chronic kidney disease.
Quality Data: The foundation of accurate predictive AI kidney disease
Because the kidney disease progression is typically silent, a reliable prediction model for risk of ESKD at the early stage of CKD can be clinically essential. Such a model is expected to facilitate physicians in making personalized treatment decisions for high-risk patients, thereby improving the overall prognosis and reducing the economic burden of this disease. That’s the biggest advantage of developing accurate and effective predictive AI algorithms for kidney disease.
This study used data from 748 CKD patients followed for an average of over 6 years, with 70 patients (9.4%) developing ESKD. The major inclusion criteria for the cohort were adult CKD patients with stable kidney functions.
In addition, patient data collection was focused on a number of parameters, as shown in the image below. All of these were important because, in a nutshell, more data equals better machine learning algorithm output.
The impact of incomplete records on CKD prediction accuracy
Missing data are very common problem in ML research that can potentially lead to a biased model and undermine the validity of study outcomes.
Traditional methods to handle missing data include complete case analysis, missing indicator, and single value imputation. Additional approaches encompass sensitivity analyses and model-based methods (e.g., mixed models or generalized estimating equations).
In the future, multiple imputation is expected to become a routine method for missing data handling in ML research, as the extra amount of computation associated with multiple imputation over those traditional methods can simply be fulfilled by the high level of computational power required by ML.
Machine Learning’s Big Five: Evaluating performance in CKD detection
All categorical variables mentioned before, such as insurance status, education, and primary disease, were encoded using the one-hot approach. Any variable was removed from model development if the missing values were greater than 50%. Missing data were handled using multiple imputation with five times of repetition, leading to five slightly different imputed datasets where each of the missing values was randomly sampled from their predictive distribution based on the observed data.
How well do ML algorithms predictive chronic kidney disease?
The researchers trained the model to perform a binary classification task with the aim of generating the probability of ESKD+ based on the given features. They used five ML algorithms in this study, including logistic regression, naive Bayes, random forest, decision tree and K-nearest neighbours. The following sections describe the results of the outputs of these algorithms:
- Three ML models such as logistic regression, naïve Bayes, and random forest showed equivalent predictability and greater sensitivity compared to the established Kidney Failure Risk Equation (KFRE).
- The random forest algorithm had the best overall performance. As measured by the AUC score, it achieved 0.81.
- KFRE model that was based on 3 simple variables, demonstrated not only a comparable AUC score but also the highest accuracy and precision.
Table 1. The performance of all algorithms
Data-Driven Results: How ML algorithms fare in CKD forecasting
The study demonstrated the feasibility of using ML to assess CKD prognosis based on easily accessible features.
In conclusion, this study provides evidence that machine learning models can effectively predict ESKD risk in CKD patients. These models potentially offer a more convenient screening tool, especially in areas where comprehensive urine testing is not always available. However, the authors note that further external validation and model improvements with additional predictor variables are needed for clinical translation.
If this topic interests you read our previous post about Cancer prognosis and predictive analytics in medicine.
References:
Qiong Bai, Chunyan Su, Wen Tang & Yike Li: Machine learning to predict end stage kidney disease in chronic kidney disease, https://www.nature.com/articles/s41598-022-12316-z