The article introduces MuTox, a multilingual audio-based toxicity detection system. It’s the first of its kind, with a dataset that includes 20,000 audio utterances for English and Spanish, and 4,000 for 19 other languages. The system offers zero-shot toxicity detection across a wide range of languages, outperforming existing text-based classifiers by more than 1% AUC and expanding language coverage by more than ten times. Compared to a wordlist-based classifier covering a similar number of languages, MuTox improves precision and recall by approximately 2.5 times, highlighting its potential in advancing the field of audio-based toxicity detection.

 

Publication date: 11 Jan 2024
Project Page: https://github.com/unitaryai/detoxify
Paper: https://arxiv.org/pdf/2401.05060