The paper presents a context understanding benchmark for Large Language Models (LLMs). It evaluates the performance of LLMs under the in-context learning pretraining scenario and assesses the context understanding of quantized models under in-context-learning settings. The results show that pre-trained dense models struggle with understanding more nuanced contextual features compared to fine-tuned models. The study also finds that 3-bit post-training quantization leads to varying degrees of performance reduction on the benchmark. The research was conducted during an internship at Apple and the code is publicly available.

 

Publication date: 2 Feb 2024
Project Page: https://github.com/apple/ml-llm-contextualization-eval
Paper: https://arxiv.org/pdf/2402.00858