Audiobox: Unified Audio Generation with Natural Language Prompts

The research paper introduces Audiobox, a model capable of generating different audio modalities. It is designed to enhance control and unify the generation of speech and sound. The model enables independent control of transcript, vocal, and other audio styles when producing speech. It utilizes a self-supervised infilling objective for pre-training on large amounts of unlabeled audio to improve model generalization with limited labels. Audiobox establishes new benchmarks in speech and sound generation, offering new methods for producing audio with unique vocal and acoustic styles. It includes Bespoke Solvers for faster generation, without compromising performance on various tasks.

Publication date: 25 Dec 2023
Project Page: https://audiobox.metademolab.com/
Paper: https://arxiv.org/pdf/2312.15821

Post Views: 281

Audiobox: Unified Audio Generation with Natural Language Prompts

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Self-Supervised Learning for Few-Shot Bird Sound Classification

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation

Leave a Reply Cancel reply

Please allow ads on our site