The paper examines the use of Large Language Models (LLMs) like GPT-3 and GPT-4 in content moderation roles. It evaluates their performance in rule-based community moderation and toxic content detection. LLMs showed promising results, with a median accuracy of 64% and precision of 83% for rule-based moderation. They also outperformed existing toxicity classifiers. However, increasing the model size added only marginal benefits to toxicity detection, suggesting a potential performance plateau.

 

Publication date: 28 Sep 2023
Project Page: https://www.aaai.org/
Paper: https://arxiv.org/pdf/2309.14517