Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
The paper presents two strategies for learning pose-aware representations in Vision Transformers (ViTs). The first strategy, called Pose-aware Attention Block (PAAB), is a plug-and-play ViT block that performs localized attention…
Continue reading