Abstract: FR-PO872
Can ChatGPT Keep up with Obstetric and Gynecologic Nephrology? Assessing Its Proficiency in Key Concepts
Session Information
- Women's Health and Kidney Diseases
November 03, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Women's Health and Kidney Diseases
- 2200 Women's Health and Kidney Diseases
Authors
- Gonzalez Suarez, Maria Lourdes, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Garovic, Vesna D., Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Kattah, Andrea G., Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
ChatGPT is a language model known for its ability to generate responses similar to those of humans across a variety of tasks. Despite ongoing discussions about the potential of ChatGPT to replace clinicians in clinical contexts, its ability to address essential concepts in a multidisciplinary field, such as obstetric and gynecologic nephrology, has not been thoroughly evaluated. The purpose of this study is to evaluate ChatGPT's proficiency in addressing fundamental questions related to the diagnosis, treatment, and management of hypertension in pregnancy.
Methods
Using the Nephrology Self-Assessment Program (NephSAP) issues V15N2 and V21N4 (questions 25-30), we conducted a study of ChatGPT's accuracy in answering fundamental questions related to obstetric and gynecologic nephrology. Questions with images were excluded. Analysis included 30 questions. Each question set was run 3 times using ChatGPT (version Mar 14, OpenAI), and we evaluated the agreement between the initial and subsequent runs, which were conducted 2 weeks apart.
Results
ChatGPT had accuracies of 66.6% on the 1st, 80% on the 2nd and 3rd runs for the NephSAP questions. We found that ChatGPT demonstrated a higher level of agreement for correct answers than for incorrect ones. However, it is important to note that the accuracy rates of 66.6% and 80% may still have room for improvement, particularly when dealing with complex and specialized medical topics like obstetric and gynecologic nephrology. ChatGPT itself acknowledged these results (Figure 1).
Conclusion
ChatGPT's proficiency in addressing fundamental queries related to obstetric and gynecologic nephrology management is below the minimum passing threshold of 75% set by the ASN for nephrologists in its 1st attempt, with an accuracy rate of 66.6%. While ChatGPT can provide some useful information, it may not be as reliable or comprehensive as a human expert in this specific field.
ChatGPT response on its performance