Pretrained language-vision models Papers

CLIPSONIC: TEXT-TO-AUDIO SYNTHESIS WITH UNLABELED VIDEOS AND PRETRAINED LANGUAGE-VISION MODELS

root June 21, 2023 0

CLIPSONIC is a novel approach to text-to-audio synthesis that leverages unlabeled videos and pretrained language-vision models. The study aims to address the challenge of acquiring high-quality text annotations for audio…

Press ESC to close

Pretrained language-vision models

Please allow ads on our site