📗 Centralized Q-Learning: define joint reward (sometimes called social welfare function) and treat \(a \in A = \displaystyle\prod_{i \in I} A_{i}\) as the action space.
📗 Indepedent Q-Learning: single agent RL algorithm to control each agent, often does not converge.
➩ \(V_{i}\left(s\right) = \text{NE}\left(Q\left(s, \cdot\right)\right)\), where Nash Equilibrium (NE) can be replaced by other game equilibrium concepts (for example Correlated Equilibrium (CE) or Coarse Correlated Equilibrium (CCE)).
📗 The solution (Nash equilibrium policy, if converges) is called Markov Perfect Equilibrium (MPE).
📗 Learning, with for example, epsilon-Greedy:
➩ With probably \(\varepsilon\), choose a random action, otherwise, sample \(a_{t} \sim \pi\left(s_{t}\right) \in \text{NE}\left(Q\left(s_{t}, \cdot\right)\right)\), and observe reward \(r_{t}, s_{t+1}\).
📗 \(\text{NE}\left(Q\left(s, \cdot\right)\right)\) is not unique for general-sum games, meaning for general-sum MGs, value functions are not defined.
📗 \(\text{NE}\left(Q\left(s, \cdot\right)\right)\) is unique for zero-sum games, minimax-Q (use minimax solver for stage game \(Q\left(s, \cdot\right)\)) converges to an MPE.
Soccer Game Example
📗 Grid-world soccer game (numbers are percentage won and episode length)
📗 Most of the deep MARL models assume partial observability (recurrent units or attention modules) and uses either centralized or independent learning (through best response dynamics, similar to GAN).
📗 Notes and code adapted from the course taught by Professors Jerry Zhu, Yudong Chen, Yingyu Liang, and Charles Dyer.
📗 Content from note blocks marked "optional" and content from Wikipedia and other demo links are helpful for understanding the materials, but will not be explicitly tested on the exams.
📗 Please use Ctrl+F5 or Shift+F5 or Shift+Command+R or Incognito mode or Private Browsing to refresh the cached JavaScript.
📗 You can expand all TopHat Quizzes and Discussions: , and print the notes: , or download all text areas as text file: .
📗 If there is an issue with TopHat during the lectures, please submit your answers on paper (include your Wisc ID and answers) or this Google form Link at the end of the lecture.
📗 Anonymous feedback can be submitted to: Form. Non-anonymous feedback and questions can be posted on Piazza: Link