The article presents OPERA, a novel decoding method for multi-modal large language models (MLLMs) to reduce hallucinations. The approach is based on an Over-trust Penalty and a Retrospection-Allocation strategy. This method does not require additional data, knowledge, or training. The research found that hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix. OPERA introduces a penalty term during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy. It has shown significant hallucination-mitigating performance on different MLLMs and metrics.
Publication date: 29 Nov 2023
Project Page: This link
Paper: https://arxiv.org/pdf/2311.17911