The article introduces a novel zero-shot gesture understanding and grounding framework called GestureGPT. It leverages large language models (LLMs) to interpret gestures not predefined in the system, linking them to GUI elements or system functions. The system is based on a dual-agent dialogue that interprets gesture descriptions and clarifies the interaction context. It was tested in real-world settings like video streaming and smart home IoT control, showing promising results in gesture understanding.

 

Publication date: 20 Oct 2023
Project Page: https://arxiv.org/abs/2310.12821v2
Paper: https://arxiv.org/pdf/2310.12821