ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

The Latest on X

Kidney Week

ASN / Education & Meetings / Kidney Week /

Please note that you are viewing an archived section from 2024 and some content may be unavailable. To unlock all content for 2024, please visit the archives.

Abstract: SA-PO003

Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions

Session Information

Augmented Intelligence, Large Language Models, and Digital Health
October 26, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM

Category: Augmented Intelligence, Digital Health, and Data Science

300 Augmented Intelligence, Digital Health, and Data Science

Authors

Sheikh, M. Salman, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Qureshi, Fawad, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Abdelgadir, Yasir, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Kashani, Kianoush, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States

Background

Large Language Models (LLMs) have significantly advanced the field of artificial intelligence (AI). The effectiveness of LLMs is substantially influenced by the structure and formulation of input queries, a process known as prompt engineering. Prompt engineering techniques, such as the chain of thought approach, which involves thinking through problems step by step, have shown promising accuracy compared to regular prompts. This study investigates the impact of the chain of thought approach on the accuracy of ChatGPT-4 in addressing acute kidney injury (AKI) and critical care nephrology questions.

Methods

We presented ChatGPT-4 with 101 questions from the Kidney Self-Assessment Program (KSAP) and Nephrology Self-Assessment Program (NephSAP). We employed two prompting methods: one using the original question and the other utilizing the chain of thought approach. The McNemar test was used to assess differences in accuracy, while Cohen's kappa was employed to evaluate agreement between the two prompting methods.

Results

ChatGPT-4 demonstrated an accuracy of 87.1% with chain of thought prompting, outperforming the 81.2% accuracy achieved with regular prompting (P=0.15). The kappa statistic for the responses provided by the two prompts is 0.80. Consistency between the two methods was observed in 84.2% of the questions, with 78.2% being correctly answered by both methods. Chain of thought prompting correctly answered nine questions that were missed under regular prompting. Among the thirteen questions missed under chain of thought prompting, a notable 76.9% were repeated errors from regular prompting. Only three questions incorrectly answered with the chain of thought prompting were correct under regular prompting.

Conclusion

The chain of thought approach improves ChatGPT-4's accuracy in addressing nephrology-related questions compared to regular prompting, although the difference is not statistically significant. These findings emphasize the importance of developing effective prompting strategies to optimize the application of LLMs in clinical decision support. Future research should aim to generalize these findings across different medical specialties to maximize the benefits of LLMs in clinical decision-making.

ASN's Mission

Contact ASN

The Latest on X

Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions

Abstract: SA-PO003

Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

Authors

M. Salman Sheikh, MD

Charat Thongprayoon, MD, FASN

Fawad Qureshi, MD, FASN

Yasir Abdelgadir, MD

Iasmina Craici, MD

Kianoush Kashani, MD, MS, FASN

Wisit Cheungpasitporn, MD, FASN

Background

Methods

Results

Conclusion