The study presents SignVQNet, a new approach for gloss-free Sign Language Production (SLP) utilizing Vector Quantization. This model converts continuous sign poses into discrete tokens, enabling authentic autoregressive generation without auxiliary information during inference. It also supports beam search, a method common in Natural Language Processing (NLP), and introduces latent-level alignment to directly associate linguistic features with sign pose features. The performance of SignVQNet was evaluated against other SLP models and found to consistently outperform its counterparts.

 

Publication date: 21 Sep 2023
Project Page: https://arxiv.org/abs/2309.12179
Paper: https://arxiv.org/pdf/2309.12179