GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

The article introduces GeoEval, a comprehensive collection of geometry math problems designed to evaluate the proficiency of Large Language Models (LLMs) and Multi-Modal Models (MMs) in problem-solving. The benchmark includes a variety of subsets, each with a different focus. The study found that the WizardMath model performed best overall, but highlighted the need for testing models against data they haven’t been pre-trained on. Additionally, GPT-series models showed improved performance on rephrased problems, suggesting a potential method for enhancing model capabilities.

Publication date: 15 Feb 2024
Project Page: https://github.com/GeoEval/GeoEval
Paper: https://arxiv.org/pdf/2402.10104

Post Views: 298

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

A privacy-preserving, distributed and cooperative FCM-based learning approach for Cancer Research

Leave a Reply Cancel reply

Please allow ads on our site