Abstract: FR-PO1177
RNA Sequencing (RNA-seq)-Based Machine-Learning Models for Kidney Injury Genes in Patients with CKD
Session Information
- CKD: Mechanisms - 2
October 25, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: CKD (Non-Dialysis)
- 2303 CKD (Non-Dialysis): Mechanisms
Authors
- Sun, Feifei, Yanshan University, Qinhuangdao, Hebei, China
- Cai, Jiahui, Yanshan University, Qinhuangdao, Hebei, China
- Sun, Yunbo, Yanshan University, Qinhuangdao, Hebei, China
- Pan, Qiaoyun, Yanshan University, Qinhuangdao, Hebei, China
- Zhao, Shasha, Yanshan University, Qinhuangdao, Hebei, China
- Wang, Danshu, Yanshan University, Qinhuangdao, Hebei, China
- Tan, Runyan, Yanshan University, Qinhuangdao, Hebei, China
- Yang, Feng, Yanshan University, Qinhuangdao, Hebei, China
- Yan, Yanling, Yanshan University, Qinhuangdao, Hebei, China
Background
Chronic kidney diseases (CKD) are prevalent and cause high patient mortality and healthcare costs. However, mechanisms have not been well characterized. We hypothesized that machine learning (ML) would provide new insights into the exploration of RNAseq assessment to identify gene networks driving disease for early diagnosis of CKD.
Methods
We retrieved CKD datasets from the Gene Expression Omnibus (GEO) database. GSE180394 and GSE47184 were randomly split for training and GSE37455 for testing. Differentially expressed genes (DEGs) were screened to differentiate CKD from healthy controls. Weighted Gene Co-expression Network Analysis (WGCNA) further selected Modules and critical genes related to CKD as potential candidates. Characteristic hub genes were identified following being trained independently with ML model algorithms, including Support Vector Machine - Recursive Feature Elimination (SVM-RFE) and Least Absolute Shrinkage and Selection Operator (LASSO). Several ML models were created to predict CKD. The relationship between the potential targets and traditional renal function parameters was also examined.
Results
WGCNA and ML algorithms identified five central Hub genes, namely COL10A1, DUSP1, GADD45A, TSC22D3, and ZFAND5. Those genes could be instrumental in diagnosing CKD through multivariate logistic regression analysis. Four ML models were established, as shown in Figure 1. The model performed robustly evidenced by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve (93.6%, 90.7%, 93.8%, and 95.0%, respectively) in the validation cohort. The Nephroseq database was utilized to reveal a positive correlation between hub-genes GADD45A (R=0.55, p=0.0084) & TSC22D3 (R=0.53, p=0.011) and GFR, as well as a negative correlation between the above two hub-genes and SCr (R=-0.61, p=0.0096; R=-0.87, p=0.011, respectively), ensuring the predicted accuracy for CKD. In addition, immunological infiltration analysis showed that these hub genes affected the recruitment and infiltration levels of immune cells in CKD.
Conclusion
ML approaches endow RNAseq analysis with promising potential for diagnosing renal injuries. Implementation of the ML model can help identify the CKD patients who would benefit from early intervention to delay CKD progression and reduce healthcare costs.
Funding
- Government Support – Non-U.S.