The article investigates the problem of Q-value estimation divergence in offline reinforcement learning (RL). The authors identify a fundamental pattern, ‘self-excitation’, as the primary cause of this divergence. They propose a novel Self-Excite Eigenvalue Measure (SEEM) metric based on Neural Tangent Kernel (NTK) to measure this property of Q-network at training. This metric provides an explanation for the emergence of divergence and can predict whether the training will diverge at an early stage. The authors also suggest a new approach to resolving divergence by regularizing the neural network’s generalization behavior, identifying LayerNorm as an effective solution.

 

Publication date: 6 Oct 2023
Project Page: https://github.com/yueyang130/SEEM
Paper: https://arxiv.org/pdf/2310.04411