Abstract: SA-PO011
Natural Language Processing for Extracting Kidney Biopsy Pathology Diagnoses: The Houston Methodist Hospital Kidney Biopsy Registry
Session Information
- Augmented Intelligence, Large Language Models, and Digital Health
October 26, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Bobart, Shane A., Houston Methodist Hospital, Houston, Texas, United States
- Hsu, Enshuo, Houston Methodist Hospital, Houston, Texas, United States
- Truong, Luan D., Houston Methodist Hospital, Houston, Texas, United States
- Waterman, Amy D., Houston Methodist Hospital, Houston, Texas, United States
- Jones, Stephen L., Houston Methodist Hospital, Houston, Texas, United States
- Shafi, Tariq, Houston Methodist Hospital, Houston, Texas, United States
Background
Kidney biopsy reports provide detailed description of kidney pathology, but the diagnosis is not captured as searchable, discrete data in electronic health records (EHR) requiring labor-intensive manual review and abstraction. We sought to use Natural Language Processing (NLP) to extract kidney biopsy pathology diagnoses as an initial step to create an automatically updated Houston Methodist Hospital Kidney Biopsy Registry (HM-KBR).
Methods
We identified 3,087 native kidney biopsies (2,700 patients) from June 2016 to December 2023. We extracted 1000 native kidney biopsy reports in PDF format from the Epic EHR. A domain expert (SAB) manually annotated the primary diagnosis in the 1000 reports and a renal pathologist (LT) validated 20% (n=200). We processed the PDFs into machine-readable free text with SQL server (database management software) and Python (programming language). We split the biopsy reports into a training set (80%) and used the bidirectional encoder representations from transformers (BERT) NLP model to extract primary diagnoses. We evaluated the NLP model performance in a stand-alone test set (20%).
The evaluation metrics included precision (false positive rate), recall (false negative rate), F1 score (harmonic mean of precision and recall, ranging 0 to 1, with 1 implying perfect model performance) and AUROC (overall performance, 1.0 is best). Due to the preliminary size of the training set, we limited the diagnosis types to those present in at least 20 reports for the evaluation metrics.
Results
The median age was 57 years, 50% were female, 28% Black and 23% Hispanic. The agreement between the two reviewers in the validation sample was assessed by Cohen’s kappa statistic (0.76; excellent). The NLP extracted diagnoses showed an F1 score of 0.66 and AUROC of 0.93 (Table 1).
Conclusion
Our preliminary data shows an accurate and scalable NLP based model to extract the primary diagnosis from free text kidney biopsy pathology reports.