What Large Language Models Bring to Text-rich VQA?
The study investigates the advantages and bottlenecks of Large Language Models (LLMs) in addressing Text-rich Visual Question Answering (VQA) tasks. These tasks involve both image comprehension and text recognition. The…
Continue reading