SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
The article discusses a proposed system for sound design that extracts repetitive actions from a video, which are used in conjunction with audio or textual embeddings to condition a diffusion…
Continue reading