Abstract: TH-PO010
Using Machine-Learning Algorithms to Predict the Risk of AKI and CKD during the COVID-19 Pandemic Based on National Electronic Health Records
Session Information
- Augmented Intelligence for Prediction and Image Analysis
October 24, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Zhang, Yue, Penn State College of Medicine, Hershey, Pennsylvania, United States
- Ghahramani, Nasrollah, Penn State College of Medicine, Hershey, Pennsylvania, United States
- Chinchilli, Vernon M., Penn State College of Medicine, Hershey, Pennsylvania, United States
- Ba, Djibril, Penn State College of Medicine, Hershey, Pennsylvania, United States
Background
The application of machine learning algorithms in predicting the risk of acute kidney injury (AKI) and chronic kidney disease (CKD) has shown promise within local healthcare organizations. However, their performance using national electronic health records (EHR) remains unclear. Moreover, the impact of including COVID-19 infection histories in predictive models is not clear.
Methods
Data were sourced from the TriNetX Research Network, with study cohort period from 2022/07/01 to 2024/03/31. The incidence of AKI or CKD was extracted based on ICD-10 codes after the index date. Covariates were assessed within 1 year before the index dates, including 4 demographics, 33 comorbidities, 18 lab test results, 13 medications, 5 vital signs, 3 visit histories, and COVID-19 infections. Missing values were imputed, and upsampling methods were used to address data imbalances. Extreme gradient boosting was used to build machine learning algorithms for predicting AKI and CKD separately, for each of the two prediction windows.
Results
We included 104,565 participants. The area under the receiver operating characteristic (AUROC) curve was stable across the prediction windows, with the highest score for CKD within 1 month (0.87) and the lowest for AKI within 1 year (0.79) (Table). Counts of inpatient visits and eGFR were the most important variables for predicting AKI and CKD, respectively. The count of COVID-19 infections was the most important variable among baseline conditions (Figure).
Conclusion
Machine learning with national EHR data shows promising performance for AKI and CKD predictions. COVID-19 status should be included in the prediction model.
Performance of Predictive Models
Prediction Models | AUROC | Precision | Sensitivity | Specificity | Accuracy | F1 score |
Incidence of CKD within 1 year | 0.8624 | 0.9958 | 0.7944 | 0.7926 | 0.7944 | 0.8838 |
Incidence of AKI within 1 year | 0.7880 | 0.9937 | 0.8081 | 0.6403 | 0.8057 | 0.8913 |
Incidence of CKD within 30 days | 0.8659 | 0.9991 | 0.8410 | 0.7778 | 0.8408 | 0.9133 |
Incidence of AKI within 30 days | 0.8025 | 0.9987 | 0.7681 | 0.7248 | 0.7680 | 0.8684 |