The paper introduces a new benchmark, FFT, for assessing the harmlessness of Large Language Models (LLMs). It addresses concerns about the potential harm caused by AI-generated texts, which can stem from factual inaccuracies, unfair biases, or toxic content. FFT is designed to evaluate LLMs for factuality, fairness, and toxicity. The study tests nine representative LLMs covering different scales, training stages, and creators. The findings suggest that the harmlessness of LLMs remains unsatisfactory, highlighting the need for further research in this area.

 

Publication date: 30 Nov 2023
Project Page: https://arxiv.org/abs/2311.18580
Paper: https://arxiv.org/pdf/2311.18580