ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: SA-PO005

Data Preprocessing: A Key Factor in Large Language Models' Performance in Critical Care Nephrology

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science

Authors

  • Sheikh, M. Salman, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Qureshi, Fawad, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Kashani, Kianoush, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background

In clinical evaluations, data is often encountered in the form of tables from outside sources. These tables can be processed as images or reformatted into text before being input into multimodal large language models (LLMs). The use of LLMs in medical education and decision-making is gaining traction. However, their accuracy in interpreting complex clinical data, particularly when presented as images rather than reformatted text, remains a concern. This study evaluates the impact of data formatting on the performance of ChatGPT-4 and Claude 3 Opus in answering critical care nephrology questions from the Kidney Self-Assessment Program (KSAP).

Methods

Fifty-six AKI and critical care nephrology questions from KSAP were reviewed, focusing on 46 questions that included tables with pertinent information such as laboratory values and diagnostic results. Initially, tables were inputted in an image-encoded format (screenshots), and the models' responses were recorded. Subsequently, the tables were reformatted into pure-text format, and the models were reassessed using the same questions. McNemar test assessed the statistical significance of the improvement in accuracy, and Cohen's Kappa test evaluated the agreement between pre-formatting and post-formatting answers for each model.

Results

In the initial run with tables in image-encoded format, ChatGPT-4 and Claude 3 Opus achieved accuracies of 50% and 43.5%, respectively. After reformatting from image-encoded format to pure-text based format, ChatGPT-4 and Claude 3 Opus' accuracies improved significantly to 73.9% and 60.87% (p<0.001), respectively. The Cohen's Kappa score for the agreement between GPT-4's pre-formatting and post-formatting answers is approximately 0.141, while the Cohen's Kappa score for the agreement between Claude 3 Opus's pre-formatting and post-formatting answers, after aligning the data for the same set of questions, is 0.350.

Conclusion

Data formatting significantly impacts the performance of LLMs in interpreting complex clinical data in critical care nephrology. Reformatting tables from image-encoded to pure-text format significantly improves the accuracy of ChatGPT-4 and Claude 3 Opus in answering KSAP questions. This highlights the importance of data preprocessing in optimizing LLM performance for clinical decision support.