The paper explores the ability of language models (LMs) to express uncertainties when answering questions. It was found that LMs often fail to express uncertainties, even when producing incorrect responses. When prompted to express confidence, LMs tend to be overconfident, leading to high error rates. Human experiments showed that users heavily rely on LM responses, regardless of whether they are marked by certainty. The study reveals that there is a bias against texts with uncertainty in preference-annotated datasets used in RLHF alignment. The paper concludes by proposing design recommendations and strategies to mitigate the identified issues.

 

Publication date: 15 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.06730