Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

The availability of online videos has transformed information access. In the medical field, these instructional videos can offer visual answers to health-related questions. This paper proposes an approach to create two large-scale datasets, HealthVidQA-CRF and HealthVidQA-Prompt, to help address the scarcity of large-scale datasets in the medical domain. The authors also propose monomodal and multimodal approaches to effectively provide visual answers from medical videos to natural language questions. The findings suggest the potential of these datasets to enhance the performance of medical visual answer localization tasks.

Publication date: 21 Sep 2023
Project Page: https://arxiv.org/abs/2309.12224v1
Paper: https://arxiv.org/pdf/2309.12224

Post Views: 304

Press ESC to close

Share Article:

root

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

Code Soliloquies for Accurate Calculations in Large Language Models

Please allow ads on our site