This paper presents a method for improving model-based reinforcement learning (MBRL) algorithms by addressing issues related to model shift and model bias. The authors propose an optimization objective that unifies the two aspects and allows for adaptive adjustments during the training process. This method aims to guarantee performance improvement and avoid model overfitting. An algorithm, USB-PO2 (Unified model Shift and model Bias Policy Optimization), is developed based on these principles. The proposed method demonstrates superior performance on several benchmark tasks.

 

Publication date: 22 Sep 2023
Project Page: https://arxiv.org/abs/2309.12671v1
Paper: https://arxiv.org/pdf/2309.12671