ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: PUB079

Alignment of ChatGPT with Expert Opinion in Nephrology Polls

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science

Authors

  • Pham, Justin, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background

Healthcare professionals often face complex clinical scenarios that do not have straightforward solutions, necessitating professional collaboration. This is common in nephrology, where soliciting peer insight is crucial for informed decision making. ChatGPT, a sophisticated language model, has demonstrated its problem-solving utility in several fields. However, its alignment with prevailing medical opinions in the context of intricate clinical scenarios remains unexplored. This study seeks to evaluate how closely ChatGPT's responses align with the nephrology community’s prevailing opinions by comparing responses to real-world clinical questions.

Methods

Nephrology polls were collected from the social media site X using the hashtag #AskRenal, resulting in 271 questions. These were presented to ChatGPT-4, which generated answers without prior knowledge of the poll outcomes. This was repeated one week later using the same questions presented in a randomized order to assess internal consistency. The responses given by ChatGPT-4 during the two rounds of inquiry were compared to the poll results (inter-rater) and between each other (intra-rater) using Cohen's kappa statistic (κ). The questions were also grouped into seven categories based on subject matter, and subgroup analysis was performed.

Results

60.2% of responses from ChatGPT matched the poll results in the first round of inquiry (κ=0.42) and 63.1% matched in the second (κ=0.46). The two rounds had an internal agreement rate of 90.4% (κ=0.86). The included table presents subgroup data.

Conclusion

ChatGPT-4 demonstrates moderate capability in replicating prevailing professional opinion in nephrology polls, with varying performance levels between question categories and high internal consistency. While AI-based language models have potential to assist with decision making in complex clinical scenarios, their reliability has yet to be fully proven and they should be integrated cautiously.

Agreement by question category
 Round 1Round 2Intra-rater
CKD, ESRD, dialysis, & transplant62% (κ=0.4)64% (κ=0.5)90% (κ=0.9)
Electrolyte & acid-base disorders62% (κ=0.5)54% (κ=0.4)92% (κ=0.9)
Glomerular disease, AKI, & critical care51% (κ=0.3)58% (κ=0.4)87% (κ=0.8)
Mineral, bone, & stone diseases78% (κ=0.7)89% (κ=0.8)89% (κ=0.8)
Pharmacology65% (κ=0.5)65% (κ=0.5)100% (κ=1.0)
Tubular, interstitial, & cystic disorders50% (κ=0.1)25% (κ=-0.1)75% (κ=0.6)
Other73% (κ=0.6)80% (κ=0.7)93% (κ=0.9)