A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

This paper provides a comprehensive survey of prompt engineering on three types of vision-language models: multimodal-to-text generation models, image-text matching models, and text-to-image generation models. Prompt engineering, the technique of augmenting a pre-trained model with task-specific hints or prompts, has been well-studied in natural language processing and has recently been investigated in vision-language modeling. Yet, there is a gap in systematic overviews of prompt engineering in pre-trained vision-language models, which this study aims to bridge.

Publication date: July 24, 2023
Project Page: https://github.com/JindongGu/Awesome-Prompting-on-Vision-Language-Model/
Paper: https://arxiv.org/pdf/2307.12980.pdf

Post Views: 427

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Leave a Reply Cancel reply

Please allow ads on our site