Abstract: SA-PO007
Optimizing Triage of Emergency, Urgent, and Elective Inbox Messages in Nephrology Using Artificial Intelligence (AI)
Session Information
- Augmented Intelligence, Large Language Models, and Digital Health
October 26, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Pham, Justin, Mayo Clinic College of Medicine and Science, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
In nephrology clinics, the workload burden associated with responding to patient inbox messages is significant. Some cases require emergent attention, such as those with severe electrolyte disorders or significant acute kidney injury. Efficient triage of these messages is crucial for ensuring timely medical attention based on urgency, reducing workload, and ultimately improving patient care. The integration of large language models like ChatGPT-4 into clinical practice has the potential to enhance operational efficiency by automating the triage process. This study aims to evaluate the accuracy of ChatGPT-4 in categorizing patient inbox messages in a nephrology clinic setting.
Methods
Two nephrologists wrote 150 patient inbox messages based on cases encountered in everyday practice at a nephrology outpatient clinic. 50 were written to simulate emergencies, 50 were urgent, and 50 were non-urgent. They were then submitted to ChatGPT-4 for independent triage into the same categories. The process was repeated after two weeks. ChatGPT responses were graded as correct, overestimation (higher priority), or underestimation (lower priority). Cohen's kappa statistic (κ) was calculated to quantify inter-rater and internal agreement.
Results
ChatGPT correctly triaged 140/150 messages (93.3%; κ = 0.9) in each trial, with an intra-rater agreement rate of 92% (κ = 0.88) across trials. It had the highest accuracy for emergent messages, followed by non-urgent, then urgent ones. There were more instances of overestimation than to underestimation. Subcategorical results are included in table 1.
Conclusion
ChatGPT-4 demonstrated near-perfect inter-rater agreement with nephrologists and high internal consistency in triaging 150 simulated patient inbox messages based on cases encountered in an outpatient clinic. The study highlights the potential for AI-driven triage systems to enhance operational efficiency and improve care for patients with kidney diseases. More research with larger datasets across medical specialties is needed to validate the generalizability of these findings.
Table 1
ChatGPT Trial 1 | ChatGPT Trial 2 | ||||||
Category | Number | Correct | Overestimate | Underestimate | Correct | Overestimate | Underestimate |
Nonurgent | 50 | 49 | 1 | 0 | 46 | 4 | 0 |
Urgent | 50 | 45 | 3 | 2 | 45 | 5 | 0 |
Emergency | 50 | 46 | 0 | 4 | 49 | 0 | 1 |
All | 150 | 140 | 4 | 6 | 140 | 9 | 1 |