The article introduces the Critic-Guided Decision Transformer (CGDT), a novel approach to offline reinforcement learning (RL). Traditional Return-Conditioned Supervised Learning (RCSL) struggles with stochastic environments and diverse future trajectory distributions. CGDT addresses these issues by combining the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. It incorporates a learned value function, the critic, ensuring alignment between target returns and expected returns of actions. This addresses the inconsistency between sampled returns within individual trajectories and expected returns across multiple trajectories. Empirical evaluations show CGDT’s superiority over traditional RCSL methods in stochastic environments and D4RL benchmark datasets.
Publication date: 22 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.13716