This paper presents SAM-G, a new framework that uses the Segment Anything Model (SAM) to improve the generalization abilities of visual reinforcement learning (RL) agents. SAM-G uses image features from DINOv2 and SAM to find correspondence as point prompts to SAM, which then generates masked images for agents. Tests show that SAM-G significantly improves visual generalization capabilities without altering the RL agent’s architecture. It achieves substantial improvements in challenging video settings compared to current methods.

 

Publication date: 28 Dec 2023
Project Page: yanjieze.com/SAM-G
Paper: https://arxiv.org/pdf/2312.17116