Abstract: SA-PO392
Studying Rare Disease Using an Electronic Health Record (EHR) and Machine Learning Based Approach: The Kaiser Permanente Southern California (KPSC) Membranous Nephropathy (MN) Cohort
Session Information
- Glomerular Diseases: Clinical, Outcomes, Trials - III
October 27, 2018 | Location: Exhibit Hall, San Diego Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Glomerular Diseases
- 1203 Glomerular Diseases: Clinical, Outcomes, and Trials
Authors
- Sun, Amy Z., Kaiser Permanente Los Angeles Medical Center, Los Angeles, California, United States
- Shu, Yu-Hsiang, Kaiser Permanente, Pasadena, California, United States
- Harrison, Teresa N., Kaiser Permanente, Pasadena, California, United States
- O'Shaughnessy, Michelle M., Stanford University, Palo Alto, California, United States
- Sim, John J., Kaiser Permanente Los Angeles Medical Center , Los Angeles, California, United States
Background
Large scale epidemiology studies on glomerular disease such as MN are needed. Identifying MN patients using EHR is limited by the need to manually review kidney biopsy pathology reports (gold standard diagnostic test) to confirm cases. An ability to accurately identify patients with MN using only structured EHR data (e.g. diagnosis codes) would enhance the efficiency and scale of observational and comparative effectiveness studies within this population.
Methods
A retrospective cohort study was performed among KPSC patients who underwent a kidney biopsy 6/28/1999-6/25/2015 (n=5542). Biopsies were manually reviewed and designated as MN or non-MN. The sensitivity (SN), specificity (SP), and positive predictive value (PPV) of ICD9 diagnosis codes appearing w/in 1 yr after biopsy were determined using 2 approaches: 1) Clinical (581.1, 582.1, or 583.1, MN specific codes) AND 2) Machine learning (ICD9 codes, kidney-related or not, with highest predictive performance).
Results
Among biopsy proven MN cases, 59% and 86% received a MN diagnosis w/in 30 days and 1 yr after biopsy, respectively. The SN and SP of this clinical approach were 86% and 76% respectively, but the PPV was 26%. If >2 codes were required, SP increased and PPV improved, but SN declined. Machine learning approach detected that using just 2 ICD9 codes (581.1 or 583.1) improved SP to 94% and PPV to 58% with a decrease in SN to 83%. SP was 98%, PPV 78%, and SN 64% if ≥3 codes was required.
Conclusion
Our study is the one of the first to leverage the EHR (ICD codes) to identify patients with biopsy-proven MN. Data-driven approaches showed better overall performance than a solely clinical-based approach. Expanding machine learning approaches to include demographics, additional clinical data, or free text from pathology reports might further increase diagnostic performance.