The article introduces AudioSeal, a novel technique designed specifically for localized detection of AI-generated speech, in response to the growing security concerns raised by advancements in speech generative models. AudioSeal employs a generator/detector architecture and presents a novel perceptual loss inspired by auditory masking. It offers state-of-the-art performance in terms of robustness to real-life audio manipulations and imperceptibility, based on automatic and human evaluation metrics. Moreover, AudioSeal’s fast, single-pass detector surpasses existing models in detection speed, making it suitable for large-scale and real-time applications.
Publication date: 31 Jan 2024
Project Page: github.com/facebookresearch/audioseal
Paper: https://arxiv.org/pdf/2401.17264