Toloka Visual Question Answering Benchmark

The paper presents the Toloka Visual Question Answering, a crowdsourced dataset designed to test machine learning systems’ performance in grounding visual question answering tasks. These tasks involve drawing a bounding box around an object in an image that correctly answers a given textual question. The paper also describes the data collection process and evaluates the performance of current pre-trained and fine-tuned models in this task. Despite several attempts, no machine learning model has yet outperformed the non-expert crowdsourcing baseline.

Publication date: 28 Sep 2023
Project Page: https://arxiv.org/abs/2309.16511
Paper: https://arxiv.org/pdf/2309.16511

Post Views: 332

bounding box, crowdsourced dataset, Fairness in Machine Learning, Toloka, Visual Question Answering

Toloka Visual Question Answering Benchmark

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

Leave a Reply Cancel reply

Please allow ads on our site