Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

The study focuses on the problem of relative overgeneralization (RO) in cooperative multi-agent learning tasks. RO occurs when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. The authors propose a general framework to enable optimistic updates in Multi-agent Policy Gradient (MAPG) methods to mitigate this issue. They employ a Leaky ReLU function where a single hyperparameter selects the degree of optimism to reshape the advantages when updating the policy. The proposed method outperforms strong baselines on 13 out of 19 tested tasks.

Publication date: 3 Nov 2023
Project Page: https://github.com/wenshuaizhao/optimappo
Paper: https://arxiv.org/pdf/2311.01953

Post Views: 298

Optimistic Multi-Agent Policy Gradient for Cooperative Tasks

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

Leave a Reply Cancel reply

Please allow ads on our site