Masked Audio Generation using a Single Non-Autoregressive Transformer
The paper introduces MAGNET, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike previous works, MAGNET is a single-stage, non-autoregressive transformer. During training,…
Continue reading