GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

GestureGPT, a novel zero-shot gesture understanding and grounding framework, leverages large language models (LLMs). It formulates gesture descriptions based on hand landmark coordinates from gesture videos and feeds them into a dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context, which a context agent organizes and provides. The gesture agent discerns user intent, grounding it to an interactive function. The system showed high accuracy in two real-world settings: video streaming and smart home IoT control.

Publication date: 19 Oct 2023
Project Page: https://arxiv.org/abs/2310.12821v1
Paper: https://arxiv.org/pdf/2310.12821

Post Views: 273

gesture recognition, GestureGPT, interaction context, Large Language Models, Molecule Zero-Shot Learning

GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Leave a Reply Cancel reply

Please allow ads on our site