Abstract: PUB081
Identification of Kidney Cell Types in Single-Cell RNA Sequencing and Single-Nucleus RNA Sequencing Data Using Machine-Learning Algorithms
Session Information
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Madapoosi, Siddharth S., University of Michigan, Ann Arbor, Michigan, United States
- Tisch, Adam, University of Michigan, Ann Arbor, Michigan, United States
- Blough, Stephen A., University of Michigan, Ann Arbor, Michigan, United States
- Rosa, Jan S., University of Michigan, Ann Arbor, Michigan, United States
- Eddy, Sean, University of Michigan, Ann Arbor, Michigan, United States
- Naik, Abhijit S., University of Michigan, Ann Arbor, Michigan, United States
- Limonte, Christine P., University of Washington, Seattle, Washington, United States
- McCown, Phillip J., University of Michigan, Ann Arbor, Michigan, United States
- Menon, Rajasree, University of Michigan, Ann Arbor, Michigan, United States
- Rosas, Sylvia E., Joslin Diabetes and Endocrinology Research Center, Boston, Massachusetts, United States
- Parikh, Chirag R., Johns Hopkins University, Baltimore, Maryland, United States
- Mariani, Laura H., University of Michigan, Ann Arbor, Michigan, United States
- Kretzler, Matthias, University of Michigan, Ann Arbor, Michigan, United States
- Mahfouz, Ahmed, Leids Universitair Medisch Centrum, Leiden, Zuid-Holland, Netherlands
- Alakwaa, Fadhl, University of Michigan, Ann Arbor, Michigan, United States
Group or Team Name
- Kidney Precision Medicine Project (KPMP).
Background
Single-cell RNA (scRNA) and single-nucleus RNA (snRNA) sequencing offer researchers valuable insight into the biological states of kidney cells. Manual cell-type annotation requires extensive domain expertise, is time-consuming, and limits broad scalability.
Methods
We compared the ability of different machine learning (ML) algorithms including support vector machines, random forests (RF), multilayer perceptrons (MLP), k-nearest neighbors (KNN), and XGBoost to predict kidney cell types from scRNA/snRNA-seq data. We used publicly available human kidney datasets comprised of 62,120 cells from 40 donor biopsies included in 5 expert-annotated sc/snRNA-seq studies. We integrated all 5 studies using the Seurat rPCA integration protocol and used inter-dataset combinations of 4 training datasets and the fifth for performance testing. We compared the performance of the different ML algorithms using F1 scores and percentage of unknown cells.
Results
Upon integration of all 5 datasets, we identified a total of 16 harmonized cell types. All 5 algorithms predicted these cell types with high accuracy (median F1=0.94 and 1.8% unknown cells). No algorithm was superior to the others overall (p>0.05) and all algorithms successfully rejected cell types naive to training data, in particular RF, MLP, and KNN models. F1 scores were lower across ML algorithms when using snRNA-seq data for testing, particularly for proximal tubule and fibroblast cells.
Conclusion
ML algorithms were able to accurately annotate adult kidney cell types from scRNA-seq, and to a lesser extent, snRNA-seq data. Our study methodology may be applied across other validation cohorts with the expectation that prediction accuracy improves over time.
Figure 1. Heatmaps demonstrating the performances of each classification algorithm on each testing dataset by (a) median F1 score or (b) median % of cells classified as unknown.
Funding
- NIDDK Support