The paper tackles the challenge of generalization in neural networks to new domains not seen during training. The authors introduce a simple framework for generalizing semantic segmentation networks by using language as the source of randomization. The framework includes three key components: preserving the intrinsic robustness of CLIP through minimal fine-tuning, language-driven local style augmentation, and randomization by locally mixing the source and augmented styles during training. The approach showed promising results in various generalization benchmarks.
Publication date: 30 Nov 2023
Project Page: https://astra-vision.github.io/FAMix
Paper: https://arxiv.org/pdf/2311.17922