Abstract: TH-PO005
ChatGPT vs. a First-Year Nephrology Fellow in Electrolyte and Acid-Base Disorders
Session Information
- AI, Digital Health, Data Science - I
November 02, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Mekraksakit, Poemlarp, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Krisanapan, Pajaree, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Kalantari, Kambiz, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
ChatGPT is a leading natural language processing model known for its impressive ability to generate human-like responses in various tasks. This study aims to assess ChatGPT's proficiency in addressing electrolyte and acid-base disorders in Nephrology.
Methods
In our study, we used nephSAP and KSAP, provided by the American Society of Nephrology (ASN), to assess ChatGPT's accuracy in answering basic questions about electrolyte and acid-base disorders. Questions with images were excluded as ChatGPT cannot process images. We evaluated a total of 152 questions, with 122 from KSAP and 30 from nephSAP. ChatGPT was tested twice, with the initial and subsequent runs conducted 1 to 2 weeks apart. To compare scores, we considered the performance of a first-year Nephrology fellow who extensively studied this topic. The complete set of questions can be found at https://education.asn-online.org/.
Results
In the 122 KSAP question banks, ChatGPT achieved accuracies of 32.8% and 37.7% on the first and second runs, respectively. In comparison, a first-year Nephrology fellow achieved an accuracy of 76.2%. On the nephSAP question banks, consisting of 30 questions, ChatGPT demonstrated an accuracy of 50% on the initial run and 53.3% on subsequent runs. The first-year Nephrology fellow correctly answered 83% of the questions. Notably, ChatGPT changed its answers on the second run for 56 out of 152 questions (36.8%). Out of these 56 questions, ChatGPT corrected its answers from incorrect to correct in 18 cases, but also changed its answers from correct to incorrect in 10 instances.
Conclusion
ChatGPT's proficiency in addressing electrolyte and acid-base disorders in nephrology is limited. It did not achieve the minimum passing threshold of 75% set by the ASN for nephrologists. Its accuracies were lower compared to a dedicated first-year Nephrology fellow. ChatGPT's responses were inconsistent across different runs. Therefore, ChatGPT is not a suitable replacement for human clinicians in this clinical setting.