The study investigates the performance of GPT-4 in healthcare applications. A unique prompting technique was used to evaluate the model’s confidence score before and after posing questions from the United States Medical Licensing Examination (USMLE) questionnaire. The results indicate that feedback influences the model’s relative confidence, but not consistently. The research contributes to understanding the reliability of AI, especially large language models (LLMs) like GPT-4, in sensitive sectors like healthcare. It offers insights into how feedback mechanisms might be optimized to enhance AI-assisted medical education and decision support.

 

Publication date: 15 Feb 2024
Project Page: https://arxiv.org/abs/2402.09654v1
Paper: https://arxiv.org/pdf/2402.09654