Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

This academic article focuses on the challenge machines face in understanding defeasible commonsense norms in a visual context. The authors have created a new multimodal benchmark, NORM LENS, consisting of 10,000 human judgments with free-form explanations covering 2,000 multimodal situations. The aim is to gauge how well models align with average human judgment and how well they can explain their predicted judgments. The study reveals that current state-of-the-art models are not well-aligned with human annotation. The authors propose a new approach to better align models with humans through distilling social commonsense knowledge from large language models.

Publication date: 17 Oct 2023
Project Page: https://seungjuhan.me/normlens
Paper: https://arxiv.org/pdf/2310.10418

Post Views: 316

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Continuously Adapting Random Sampling (CARS) for Power Electronics Parameter Design

Underwater and Surface Aquatic Locomotion of Soft Biomimetic Robot Based on Bending Rolled Dielectric Elastomer Actuators

Leave a Reply Cancel reply

Please allow ads on our site