This research presents PromptAlign, an innovative approach to enhance the effectiveness of vision-language models such as CLIP for zero-shot generalization tasks. Unlike previous methods, PromptAlign addresses the issue of distribution shift, a primary cause of performance degradation when dealing with unseen domains. It aligns out-of-distribution (OOD) test sample statistics with the source data, minimizing the shift in feature distribution. The method uses a single test sample to adapt multi-modal prompts at test time. Compared to existing techniques, PromptAlign improves zero-shot top-1 accuracy and consistently enhances performance across all datasets in cross-dataset generalization with unseen categories.
Publication date: 2 Nov 2023
Project Page: https://jameelhassan.github.io/promptalign/
Paper: https://arxiv.org/pdf/2311.01459