ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: SA-PO001

Nephrology Tools: Using Chatbots for Image Interpretation and Answering Questions

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science

Authors

  • Garcia Valencia, Oscar Alejandro, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Krisanapan, Pajaree, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Suppadungsuk, Supawadee, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background

Effective medical diagnosis and treatment planning rely heavily on clinical imaging. As of September 2023, ChatGPT-4 had broadened its functionality to allow image interpretation, providing comprehensive explanations and problem-solving insights. However, the effectiveness of chatbots in image interpretation remains understudied. We aimed to evaluate the performance of leading chatbots (i.e GPT-4, Bard AI, and Bing Chat), in interpreting nephrology images and addressing related test questions.

Methods

We assessed 57 nephrology test questions with their associated images from the Nephrology Self-Assessment Program and Kidney Self-Assessment Program tests. This group included 19 kidney histopathological images, 28 radiological images, and 10 images from miscellaneous categories. We ommited the image descriptions to focus only on the chatbots' interpretative abilities. Each question was analyzed two times through GPT-4, Bing Chat, and Bard AI. We then calculated and compared the accuracy and concordance rates for both image interpretation and question answering by these AI models.

Results

Regarding image interpretation, GPT-4 showed a 79% total accuracy, outperforming Bard AI’s 51% and Bing Chat’s 35% (p<0.001). All chatbots displayed a similar performance across image types (p=0.57, 0.39, and 0.38 for GPT-4, Bard AI and Bing Chat, respectively). On image-related questions, GPT-4, Bard, and Bing Chat yielded comparable results, with total accuracy rates of 60%, 53%, and 61% (p>0.05) and concordance rates of 75%, 68%, and 74% (p>0.05), respectively. Notably, GPT-4 and Bard AI were likely to provide correct answer when their image interpretation was accurate (correct response 60% vs 37%, p=0.01 for GPT-4; 60% vs 27%, p<0.001 for Bard AI when image interpretation was correct and incorrect respectively), while Bing Chat’s accuracy on answering questions did not differ based on its accuracy of image interpretation (58% vs 51%, p=0.50).

Conclusion

These results highlight the challenges in developing AI-driven chatbots for medical image interpretation in nephrology, with GPT-4 showeing remarkable potential. The study provides valuable insights for further refinement of AI models, emphasizing the importance of accuracy, especially in scenarios where medical decisions rely on precise image interpretations.