The study critically examines the use of toxicity detection algorithms in proactive content moderation systems. It explores the potential misuse of these systems to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exacerbate harm. The study also highlights the contextual complexities that may exacerbate inequalities around content moderation processes.

 

Publication date: 19 Jan 2024
Project Page: https://doi.org/XXXXXXX.XXXXXXX
Paper: https://arxiv.org/pdf/2401.10629