The article introduces SODA, a self-supervised diffusion model designed for representation learning. The model includes an image encoder that transforms a source view into a compact representation, which guides the generation of related novel views. By imposing a bottleneck between the encoder and a denoising decoder and utilizing novel view synthesis as a self-supervised objective, diffusion models can become effective representation learners. SODA is the first diffusion model to succeed at ImageNet linear-probe classification, and it can accomplish reconstruction, editing, and synthesis tasks across various datasets. The model’s latent space has a disentangled nature, which serves as an effective interface to control and manipulate the produced images.

 

Publication date: 30 Nov 2023
Project Page: soda-diffusion.github.io
Paper: https://arxiv.org/pdf/2311.17901