The article discusses the limitations of Large Language Models (LLMs) and Vision Models (LVMs) in capturing holistic meaning in multimodal online marketing campaigns. The authors propose a framework that combines knowledge graphs with Visual Language Models (VLMs) to improve the performance of predicting the effectiveness of multimodal marketing campaigns. The suggested approach enables early detection of potentially persuasive campaigns and contributes to marketing theory. However, the authors indicate that existing models, including newer ones, often fail to identify explicit semantic connections between modalities that influence human interpretation, which is crucial for effective multimodal marketing.

 

Publication date: 7 Feb 2024
Project Page: not provided
Paper: https://arxiv.org/pdf/2402.03607