Abstract: FR-PO013
Is Artificial Intelligence (AI) the Missing Member of the Fellowship Interview Panel?
Session Information
- Classroom to Bedside: Transforming Medical Education
October 25, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Educational Research
- 1000 Educational Research
Authors
- Hommos, Musab S., Mayo Clinic Arizona, Scottsdale, Arizona, United States
- Bhasin-Chhabra, Bhavna, Mayo Clinic Arizona, Scottsdale, Arizona, United States
- Keddis, Mira T., Mayo Clinic Arizona, Scottsdale, Arizona, United States
Group or Team Name
- Mayo Clinic Arizona Nephrology Fellowship.
Background
Creating an objective summary of the interviewers' feedback to rank fellowship applicants is a challenging task that can be adversely affected by implicit bias. In this study, we examined the performance of ChatGPT (CGPT) in summarizing interviewers' evaluations of the applicants and compared the CGPT-generated ranking to our rank list.
Methods
For each applicant, 4 faculty independently generated evaluation narratives, each about 100 words. PD use these narratives to create a list of strengths & weaknesses for the ranking meetings. In this study, we provided de-identified narratives to CGPT4.0. We compared the performance of two prompts: a simple prompt that requested CGPT to summarize the strengths and weaknesses and rank applicants without program-specific details and a complex prompt that described our program’s mission, aims, and desired applicant attributes. We used data from the 2022 & 2023 seasons. Spearman rank correlation coefficient was used to compare CGPT rank list to the final rank list by the program. The study was deemed exempt by IRB.
Results
There were 29 applicants from 2022 and 26 from 2023. Narratives from 2022 were about 50% shorter. The performance of the complex CGPT prompt was superior to the simple prompt in detecting strengths and weaknesses and in the final ranking. The complex prompt-generated rank list had a correlation coefficient with our rank list of 0.95 (p<0.001) for 2022 and 0.99 (p<0.001) for 2023. The simple prompt correlation was 0.85 (p<0.001) and 0.83 (p<0.001). See Figure 1
Conclusion
A longer evaluation narrative and more complex prompt that included the program-specific information resulted in an excellent performance by CGPT in summarizing the applicants' strengths & weaknesses, which resulted in a final rank that correlated very closely with the program's rank list. AI limitations included missing subtle applicant attributes that a program may consider necessary but not defined in the prompt. Future research is needed to study the effect of AI-assisted ranking on implicit bias during ranking.