Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

This research evaluates the clinical alignment of GPT-4 evaluation with human clinician experts in assessing responses to ophthalmology-related patient queries generated by fine-tuned large language model (LLM) chatbots. A dataset of 400 general ophthalmology questions and 400 paired answers were created, which was divided for fine-tuning and testing. Five different LLMs were fine-tuned. GPT-4 evaluation was compared against human ranking by 5 clinicians for clinical alignment. The study found a significant agreement between GPT-4 evaluation and human clinician rankings. However, there were clinical inaccuracies in the LLM-generated responses, which were identified by the GPT-4 evaluation.

Publication date: 16 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.10083

Post Views: 300

Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

A privacy-preserving, distributed and cooperative FCM-based learning approach for Cancer Research

SwissNYF: Tool Grounded LLM Agents for Black Box Setting

Leave a Reply Cancel reply

Please allow ads on our site