Young Wu's Homepage

Prev: W3, Next: W5

# Overview

📗 Readings: MARL Chapter 3 and AI Chapter 9.

📗 Wikipedia page: Link

# Game Theory

📗 If there are multiple agents and the reward for one agent depends on the actions taken by all other agents, then these agents are participating in a game.

📗 For zero-sum games, if player 1 gets \(R\left(a\right) = R\left(a_{1}, a_{2}\right)\), then player 2 gets \(- R\left(s, a\right)\). The value of the game is defined as the \(V = \displaystyle\max_{a_{1}} \displaystyle\min_{a_{2}} R\left(a_{1}, a_{2}\right) = \displaystyle\min_{a_{2}} \displaystyle\max_{a_{1}} R\left(a_{1}, a_{2}\right)\), the value is unique due to the Minimax Theorem. The policy \(\pi = \left(a^\star_{1}, a^\star_{2}\right)\) that achieves the value \(V\) is called a Nash equilibrium, which is defined as the solution to \(a^\star_{1} = \mathop{\mathrm{argmax}}_{a_{1}} R\left(a_{1}, a^\star_{2}\right)\) and \(a^\star_{2} = \mathop{\mathrm{argmin}}_{a_{2}} R\left(a^\star_{1}, a_{2}\right)\), meaning both player 1 and player 2 would not prefer to unilaterally change their actions from \(a^\star_{1}, a^\star_{2}\) to another action.

📗 For general-sum games, \(R\left(a\right) = R\left(a_{1}, a_{2}, ..., a_{n}\right)\) is a vector where \(R_{i}\left(a_{i}, a_{-i}\right)\) is the reward for player \(i\) given player \(i\) uses \(a_{i}\) and the other players use \(a_{-i} = a_{1}, a_{2}, ..., a_{n}\) without \(a_{i}\). The Nash equilibrium is defined the same way as \(\pi = \left(a^\star_{1}, a^\star_{2}, ..., a^\star_{n}\right)\) where \(a^\star_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a^\star_{-i}\right)\), meaning no player would prefer to unilaterally change their actions from \(a^\star_{i}\) to another action. The value of the game is undefined since there could be multiple Nash equilibria.

📗 One way to find a (stable) Nash equilibrium is through best response dynamics, i.e start with arbitrary \(a_{1}, a_{2}, ..., a_{n}\) and iteratively update \(a_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a_{-i}\right)\). Hide-and-Seek: Link

# Game

# Output:

📗 Value:

📗 Player 1 best response: ,

📗 Player 2 best response: ,

# Settings:

📗 Zero-sum game:

📗 Reward Function 1: , 2:

or by
from to

📗 Strategy 1: , 2:

📗 Number of replays:

Last Updated: July 01, 2025 at 1:46 AM