Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
The article investigates the problem of Q-value estimation divergence in offline reinforcement learning (RL). The authors identify a fundamental pattern, ‘self-excitation’, as the primary cause of this divergence. They propose…
Continue reading