The article discusses the use of pretrained Vision-Language Models (VLMs) as zero-shot reward models (RMs) for Reinforcement Learning (RL). The authors propose a method, called VLM-RMs, that uses these models to specify tasks via natural language, eliminating the need for manually specified reward functions. This method was tested using a MuJoCo humanoid robot, which was trained to perform complex tasks like kneeling and doing the splits, based on single sentence text prompts. The study found that larger VLMs trained with more compute and data make better reward models.
Publication date: 19 Oct 2023
Project Page: https://sites.google.com/view/vlm-rm
Paper: https://arxiv.org/pdf/2310.12921