Using Large Language Models to Assess Tutors’ Performance in Reacting to Students Making Math Errors

The paper investigates the role of large language models (LLMs) in evaluating tutors’ performance when dealing with students’ math errors. The study analyzes 50 real-life tutoring dialogues and finds that models like GPT-3.5-Turbo and GPT-4 are proficient in assessing tutors’ reactions to students’ errors. However, these models also have limitations, like overidentifying students’ errors. Future work will focus on a larger dataset and evaluating learning transfer in real-life scenarios.

Publication date: 9 Jan 2024
Project Page: https://arxiv.org/abs/2401.03238v1
Paper: https://arxiv.org/pdf/2401.03238

Post Views: 266

Generative Large Language Models, GPT-4, Intelligent Tutoring Systems, math errors, real-time feedback

Using Large Language Models to Assess Tutors’ Performance in Reacting to Students Making Math Errors

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

An intelligent sociotechnical systems (iSTS) framework: Toward a sociotechnically-based hierarchical human-centered AI approach

Leave a Reply Cancel reply

Please allow ads on our site