This article presents a new control framework using vision language models (VLMs) for multiple tasks and robots. The authors combine the vision-language CLIP model with randomized control. This framework aims to reduce the high costs of learning control policies for tasks and robots not included in the training environment. The proposed method is verified effective through a multitask simulation and a real robot experiment.
Publication date: 18 Jan 2024
Project Page: http://www.tandfonline.com/arXiv:2401.10085v1
Paper: https://arxiv.org/pdf/2401.10085