The article discusses the theoretical understanding of self-supervised learning (SSL) methods like SimCLR, CLIP, and VicREG. The authors propose a generative latent variable model to explain the principle behind these methods. This model shows that discriminative SSL algorithms induce a latent structure over representations. The model also explains the mechanism where SSL objectives ‘pull together’ representations of semantically related data and ‘push apart’ others. The authors suggest that fitting this model generatively improves performance over previous methods on benchmarks and outperforms discriminative methods where style information is required.

 

Publication date: 5 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.01399