The study discusses the potential harms posed by AI-generated texts, especially from factoid, unfair, and toxic content. The authors propose FFT, a new benchmark with 2116 instances, for evaluating the harmlessness of Large Language Models (LLMs) in terms of factuality, fairness, and toxicity. The research evaluates 9 representative LLMs covering various parameter scales, training stages, and creators. Findings reveal that the harmlessness of LLMs is still under-satisfactory, prompting further research into harmless LLMs.

 

Publication date: 30 Nov 2023
Project Page: https://arxiv.org/abs/2311.18580v1
Paper: https://arxiv.org/pdf/2311.18580