ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: SA-PO011

Natural Language Processing for Extracting Kidney Biopsy Pathology Diagnoses: The Houston Methodist Hospital Kidney Biopsy Registry

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science

Authors

  • Bobart, Shane A., Houston Methodist Hospital, Houston, Texas, United States
  • Hsu, Enshuo, Houston Methodist Hospital, Houston, Texas, United States
  • Truong, Luan D., Houston Methodist Hospital, Houston, Texas, United States
  • Waterman, Amy D., Houston Methodist Hospital, Houston, Texas, United States
  • Jones, Stephen L., Houston Methodist Hospital, Houston, Texas, United States
  • Shafi, Tariq, Houston Methodist Hospital, Houston, Texas, United States
Background

Kidney biopsy reports provide detailed description of kidney pathology, but the diagnosis is not captured as searchable, discrete data in electronic health records (EHR) requiring labor-intensive manual review and abstraction. We sought to use Natural Language Processing (NLP) to extract kidney biopsy pathology diagnoses as an initial step to create an automatically updated Houston Methodist Hospital Kidney Biopsy Registry (HM-KBR).

Methods

We identified 3,087 native kidney biopsies (2,700 patients) from June 2016 to December 2023. We extracted 1000 native kidney biopsy reports in PDF format from the Epic EHR. A domain expert (SAB) manually annotated the primary diagnosis in the 1000 reports and a renal pathologist (LT) validated 20% (n=200). We processed the PDFs into machine-readable free text with SQL server (database management software) and Python (programming language). We split the biopsy reports into a training set (80%) and used the bidirectional encoder representations from transformers (BERT) NLP model to extract primary diagnoses. We evaluated the NLP model performance in a stand-alone test set (20%).
The evaluation metrics included precision (false positive rate), recall (false negative rate), F1 score (harmonic mean of precision and recall, ranging 0 to 1, with 1 implying perfect model performance) and AUROC (overall performance, 1.0 is best). Due to the preliminary size of the training set, we limited the diagnosis types to those present in at least 20 reports for the evaluation metrics.

Results

The median age was 57 years, 50% were female, 28% Black and 23% Hispanic. The agreement between the two reviewers in the validation sample was assessed by Cohen’s kappa statistic (0.76; excellent). The NLP extracted diagnoses showed an F1 score of 0.66 and AUROC of 0.93 (Table 1).

Conclusion

Our preliminary data shows an accurate and scalable NLP based model to extract the primary diagnosis from free text kidney biopsy pathology reports.