Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
The article presents a novel training paradigm, ITIT, for vision-language generative models. Current models rely on large corpora of paired image-text data for optimal performance. However, collecting such data leads…
Continue reading