Abstract: SA-PO758
Random Survival Forest Identifies Key Factors Predicting CKD Progression: The CRIC Study
Session Information
- CKD: Epidemiology, Risk Factors, Prevention - III
October 27, 2018 | Location: Exhibit Hall, San Diego Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: CKD (Non-Dialysis)
- 1901 CKD (Non-Dialysis): Epidemiology, Risk Factors, and Prevention
Authors
- Zheng, Zihe, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Xie, Dawei, University of Pennsylvania School of Medicine Center for Clinical Epidemiology and Biostatistics, Philadelphia, Pennsylvania, United States
- Weiner, Shoshana, University Of Maryland School Of Medicine, Baltimore, Baltimore, Maryland, United States
- Chen, Jing, Tulane School of Medicine, New Orleans, Louisiana, United States
- Cohen, Debbie L., University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States
- Drawz, Paul E., University of Minnesota, Minneapolis, Minnesota, United States
- Raj, Dominic S., GWU Medical Faculty Associates, Washington, District of Columbia, United States
- Sondheimer, James H., Wayne State University School of Medicine, Detroit, Michigan, United States
- Feldman, Harold I., University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Lash, James P., University of Illinois at Chicago, Chicago, Illinois, United States
- Anderson, Amanda Hyre, University of Pennsylvania, Philadelphia, Pennsylvania, United States
Group or Team Name
- CRIC (Chronic Renal Insufficiency Cohort)
Background
Increasing numbers of clinically-available and novel factors bring opportunities to improve CKD risk prediction, as well as statistical challenges of multiple comparison, nonlinearity, variable interactions, and missing data. We applied the machine learning technique of random survival forest (RSF) to identify factors associated with CKD progression.
Methods
We studied 3,939 subjects in the Chronic Renal Insufficiency Cohort and followed them for the composite survival outcome of eGFR halving or incident ESRD for 12 years. We used 73 clinically-available and 25 novel baseline variables as exposures, which covered a broad spectrum of socio-demographics, comorbidities, physical and laboratory measurements and medications. We applied the RSF approach with 1000 bootstrap iterations and log-rank splitting rule. We calculated statistics of variable importance and minimal depth to rank predictors according to their impact on prediction accuracy. We also graphed the adjusted relationships of CKD progression and the top 10 strongest predictors.
Results
After setting the outliers to missing, we included 3,921 individuals in the analysis. Missing data were imputed and variable interactions were incorporated in the RSF algorithm. The 98-predictor RSF model yielded a low prediction error of 14.2% (1 minus Harrell’s C-index). The top 10 predictors identified with highest variable importance values and smallest minimal depths are urine protein/creatine ratio, urine albumin/creatinine ratio, eGFR, serum urea nitrogen, parathyroid hormone, serum albumin, high sensitivity troponin T, systolic blood pressure, NT-proBNP, and FGF-23. The relationships of the predicted survival at 1-, 2-, and 5-years and the 10 factors are shown in the partial dependence plots.
Conclusion
We identified and ranked variables that are most important to CKD progression from among 98 clinically-available and novel factors using the RSF method. Utilizing high-dimensional data enabled us to predict outcomes with a low error rate.
Partial dependence plot
Funding
- NIDDK Support