Optimistic Multi-Agent Policy Gradient for Cooperative Tasks
The study focuses on the problem of relative overgeneralization (RO) in cooperative multi-agent learning tasks. RO occurs when agents converge towards a suboptimal joint policy due to overfitting to suboptimal…
Continue reading