HumanTOMATO: Text-aligned Whole-body Motion Generation

The article presents HumanTOMATO, a framework for generating whole-body motion from textual descriptions. Traditional models often ignore the importance of fine-grained control over hands and face in creating realistic motion, and struggle with aligning text and motion. HumanTOMATO addresses these issues with a Holistic Hierarchical VQ-VAE and a Hierarchical-GPT for detailed body and hand motion reconstruction, and a pre-trained text-motion-alignment model. This results in more realistic, text-aligned motion generation.

Publication date: 19 Oct 2023
Project Page: https://lhchen.top/HumanTOMATO
Paper: https://arxiv.org/pdf/2310.12978

Post Views: 347

HumanTOMATO: Text-aligned Whole-body Motion Generation

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Putting the Object Back into Video Object Segmentation

On the Hidden Waves of Image

Leave a Reply Cancel reply

Please allow ads on our site