The research presents ManipLLM, a system that improves robotic manipulation by leveraging Multimodal Large Language Models (MLLMs). Traditional learning-based robot manipulation often struggles with generalizability, especially with extensive categories. ManipLLM addresses this by using MLLMs’ robust reasoning capabilities to enhance stability and generalization. The system fine-tunes injected adapters to preserve the inherent common sense and reasoning ability of the MLLMs while equipping them for manipulation. The approach incorporates object category understanding, affordance prior reasoning, and object-centric pose prediction. In real-world application, a test-time adaptation (TTA) strategy is designed for manipulation to better adapt to the current real-world scene configuration.

 

Publication date: 24 Dec 2023
Project Page: https://sites.google.com/view/manipllm
Paper: https://arxiv.org/pdf/2312.16217