This academic article is about the Factorized Multi-Agent MiniMax Q-Learning (FM3Q), a new framework for two-team zero-sum Markov games. The authors identify inefficiencies in existing methods and propose the individual-global-minimax (IGMM) principle to ensure coherence between two-team minimax behaviors and individual greedy behaviors. Using this principle, FM3Q factorizes the joint minimax Q function into individual ones for more efficient learning. An online learning algorithm with neural networks is also proposed to implement FM3Q. Theoretical analysis and empirical testing show FM3Q’s superior performance and efficiency in two-team zero-sum Markov games.

 

Publication date: 2 Feb 2024
Project Page: Unavailable
Paper: https://arxiv.org/pdf/2402.00738