The research analyzes the quality of intent recognition and user satisfaction with answers from intent-based prompt reformulations for two recent ChatGPT models, GPT-3.5 Turbo and GPT-4 Turbo. The results reveal that GPT-4 outperforms GPT-3.5 on the recognition of common intents, but is often outperformed by GPT-3.5 on the recognition of less frequent intents. Users are more satisfied with the answers of GPT-4 compared to GPT-3.5 when the user intent is correctly recognized. However, they prefer the models’ responses to their original prompts compared to the reformulated ones.

 

Publication date: 7 Feb 2024
Project Page: https://arxiv.org/abs/2402.02136v1
Paper: https://arxiv.org/pdf/2402.02136