GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
The paper introduces a pipeline that enhances a Vision Language Model, GPT-4V(ision), by incorporating human action observations to facilitate robotic manipulation. This system analyzes videos of humans performing tasks and…
Continue reading