DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
This paper introduces a new diffusion autoregressive model (DIFFAR) for generating high-quality raw speech waveforms. The model generates overlapping frames sequentially, each conditioned on a portion of the previously generated…
Continue reading