Abstract: TH-PO045
Using an Artificial Intelligence Tool Incorporating Natural Language Processing to Identify Low-Prevalence Cases of ANCA-Associated Vasculitis in Electronic Health Records
Session Information
- AI, Digital Health, Data Science - I
November 02, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- van Leeuwen, Jolijn R., Leids Universitair Medisch Centrum, Leiden, Zuid-Holland, Netherlands
- Penne, Erik Lars, Noordwest Ziekenhuisgroep, Alkmaar, Noord-Holland, Netherlands
- Rabelink, Ton J., Leids Universitair Medisch Centrum, Leiden, Zuid-Holland, Netherlands
- Knevel, Rachel, Leids Universitair Medisch Centrum, Leiden, Zuid-Holland, Netherlands
- Teng, Yoe Kie Onno, Leids Universitair Medisch Centrum, Leiden, Zuid-Holland, Netherlands
Background
Anti-neutrophil cytoplasmatic antibody (ANCA)-associated vasculitis (AAV) is a rare, life-threatening, systemic auto-immune disease. Due to the low prevalence and heterogenous registration, there is an urgent need to improve identification of AAV patients within the electronic health record (EHR)-system of health organizations to facilitate clinical research.
Methods
Our aim was to identify, with a high sensitivity, low-prevalence AAV patients within large EHR-systems (>2.000.000 records) using an artificial intelligence (AI)-search tool. We combined a search on structured and unstructured data with natural language processing (NLP)-based exclusion. We developed the method in an academic center with an established AAV training set (n=203) and validated the method in a non-academic center with a validation set (n=84). We anonymously reviewed all identified patient records for AAV diagnosis.
Results
The final search strategy combined four queries on disease description, laboratory measurements, medication and specialisms. In the training center, this search identified 608 patients, of which 346 were AAV patients upon manual review. 197/203 patients of the training set were retrieved, indicating a sensitivity of 97%. Employing NLP-based exclusion resulted in 444 patients with 339 AAV patients, resulting in an increase of positive predictive value (PPV) from 57% to 78% and a sensitivity of 96%. In the validation center the search strategy identified 333 patients, of which 194 were AAV patients, including 82/84 (98%) patients of the validation set. After NLP-based exclusion 223 patients remained, including 196 AAV patients, improving PPV from 58 to 86% with a sensitivity of 98%. Our identification method outperformed ICD-10 coding predominantly in identifying myeloperoxidase (MPO)-positive AAV patients and patients with few specialisms involved.
Conclusion
We demonstrated excellent performance of an AI-based identification method, incorporating NLP, to identify AAV patients in EHRs and we validated the applicability and transportability. This method can accelerate research efforts, while avoiding the limitations of ICD-10-based registration.
Funding
- Commercial Support – Vifor Pharma