The paper by Mark Levy and others from Apple demonstrates how conditional generation from diffusion models can be applied to various tasks in music production. This includes continuation, inpainting, and regeneration of musical audio, creating smooth transitions between different music tracks, and transferring desired stylistic characteristics to existing audio clips. The approach allows for fine-grained control over the musical output and removes the need for paired data during training. The paper suggests that there’s huge potential for music production incorporating a diffusion model as a generative prior.

 

Publication date: 1 Nov 2023
Project Page: https://arxiv.org/abs/2311.00613
Paper: https://arxiv.org/pdf/2311.00613