Abstract: SA-PO611
Leveraging Statistical Natural Language Processing (NLP) to Surface Clinically Relevant Biomarkers in Pediatric Nephrotic Syndrome
Session Information
- Noncystic Mendelian Diseases
November 04, 2017 | Location: Hall H, Morial Convention Center
Abstract Time: 10:00 AM - 10:00 AM
Category: Genetic Diseases of the Kidney
- 802 Non-Cystic Mendelian Diseases
Authors
- Hildebrandt, Friedhelm, Boston Children's Hospital, Boston, Massachusetts, United States
- Lovric, Svjetlana, Boston Children's Hospital, Boston, Massachusetts, United States
- Shril, Shirlee, Boston Children's Hospital, Boston, Massachusetts, United States
- Mcneillie, Patrick, IBM Watson Health, Bethesda, Maryland, United States
- Dankwa-Mullan, Irene, IBM Watson Health, Bethesda, Maryland, United States
- Leibovitz, Evan, IBM, Cambridge, Massachusetts, United States
- Scanlan, Kevin J, IBM Watson Health, Bethesda, Maryland, United States
Background
The majority of pediatric idiopathic nephrotic syndrome (NS) have minimal change disease, which is generally responsive to steroid therapy. Patients with genetic forms of steroid-resistant nephrotic syndrome (SRNS) are unresponsive to steroid therapy. Thus, therapeutic decisions are based on the underlying etiology, renal histology and genetic screening. While mechanisms of NS are not well understood, recent advances in molecular genetics have shown that single gene defects are responsible for a 25-33% of all cases of isolated and syndromic SRNS. Biomarkers represent significant value to the clinical domain, offering information on disease diagnosis, prognosis, risk-assessment, and treatment efficacy. However, the process of extracting biomarkers from unstructured literature is time consuming and requires domain expertise. This study evaluates the potential for NLP and cognitive analytics to facilitate a review of SRNS to accelerate discovery of potential biomarkers.
Methods
Boston Children’s Hospital (BCH) and IBM collaborated to train a machine learning model to identify appropriate entities and relationships across literature articles focused on SRNS. The team identified and labeled 11 entity types and 50 relationship types across 180 literature articles. The trained model was tested against the unstructured text of articles and outputs were analyzed for accuracy and precision.
Results
Comparing the expert output and the trained model showed 100% precision (23/23) and 92.0% sensitivity (23/25). One false-negatives was due to lack of co-reference, which links the lexical subject across multiple sentences. The other false-negative was due to the gene not being identified as relevant. The model took less than 30 seconds to identify the relevant biomarkers and provided passage level references to enable seamless follow up by the researcher.
Conclusion
The machine learning model provided rapid and accurate extraction of potential molecular biomarkers for NS. With additional training this model could be expanded to other rare diseases, accelerating mutational analysis for therapeutic interventions.
Funding
- Commercial Support – IBM