Abstract: PO0759
Feature Selection and Machine Learning Model for Predicting Diabetic Kidney Disease Risk in Asians
Session Information
- Diabetic Kidney Disease: Clinical
November 04, 2021 | Location: On-Demand, Virtual Only
Abstract Time: 10:00 AM - 12:00 PM
Category: Diabetic Kidney Disease
- 602 Diabetic Kidney Disease: Clinical
Authors
- Sabanayagam, Charumathi, Singapore Eye Research Institute, Singapore, Singapore
- He, Feng, Singapore Eye Research Institute, Singapore, Singapore
- Nusinovici, Simon, Singapore Eye Research Institute, Singapore, Singapore
- Lim, Cynthia Ciwei, Singapore General Hospital, Singapore, Singapore
- Li, Jialiang, National University of Singapore, Singapore, Singapore
- Wong, Tien Yin, Singapore Eye Research Institute, Singapore, Singapore
- Cheng, Ching-Yu, Singapore Eye Research Institute, Singapore, Singapore
Background
Machine learning (ML) techniques may improve disease prediction and interpretability of regression models by identifying the most relevant features in multi-dimensional data. We evaluated the ability of various ML classifiers for feature identification and improving the prediction accuracy of diabetic kidney disease (DKD).
Methods
We utilized longitudinal data from 1364 Chinese, Malay and Indian participants aged 40-80 years with diabetes but free of DKD who attended the baseline visit of the Singapore epidemiology of Eye Diseases Study in 2004-2011 and were followed up for 6 years (2011-2017). Incident DKD (n=162) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73m2+25% decrease in eGFR at follow-up. We evaluated 339 features including demographic/clinical, retinal imaging, genetic and serum metabolomics profile and tested nine ML algorithms along with feature selection (gradient boosting decision tree, elastic net, random forest, support vector machine, neural network, LASSO etc.). The performance of the best ML model based on optimum features was compared to that of logistic regression (LR) with traditional risk factors using the area under the receiver operating characteristic curve (AUC), sensitivity and specificity.
Results
The best performing model was a combination of Recursive feature elimination (RFE) for variable selection and Elastic Net (EN) using 15 predictors from demographic/clinical +metabolite set with AUC, sensitivity and specificity of 0.852, 83.0% and 73.5% compared to 0.796, 83.0% and 61.8% by LR. The top-15 predictors of DKD risk included seven risk factors and eight metabolites: age, antidiabetic medication use, presence of hypertension, diabetic retinopathy, higher levels of systolic blood pressure, HbA1c, lower levels of eGFR; higher levels of triglycerides in IDL, phospholipids in chylomicrons and medium VLDL, total cholesterol in chylomicrons and very small VLDL, medium LDL, cholesterol esters in very large HDL and lower levels of DHA, lactate and acetate.
Conclusion
ML together with feature selection improved prediction accuracy of DKD risk in the general population with diabetes and identified novel risk factors including metabolites.
Funding
- Government Support – Non-U.S.