Prev: W3, Next: W5

# Overview

📗 Readings: MARL Chapter 3 and AI Chapter 9.
📗 Wikipedia page: Link

# Game Theory

📗 If there are multiple agents and the reward for one agent depends on the actions taken by all other agents, then these agents are participating in a game.
📗 For zero-sum games, if player 1 gets \(R\left(a\right) = R\left(a_{1}, a_{2}\right)\), then player 2 gets \(- R\left(s, a\right)\). The value of the game is defined as the \(V = \displaystyle\max_{a_{1}} \displaystyle\min_{a_{2}} R\left(a_{1}, a_{2}\right) = \displaystyle\min_{a_{2}} \displaystyle\max_{a_{1}} R\left(a_{1}, a_{2}\right)\), the value is unique due to the Minimax Theorem. The policy \(\pi = \left(a^\star_{1}, a^\star_{2}\right)\) that achieves the value \(V\) is called a Nash equilibrium, which is defined as the solution to \(a^\star_{1} = \mathop{\mathrm{argmax}}_{a_{1}} R\left(a_{1}, a^\star_{2}\right)\) and \(a^\star_{2} = \mathop{\mathrm{argmin}}_{a_{2}} R\left(a^\star_{1}, a_{2}\right)\), meaning both player 1 and player 2 would not prefer to unilaterally change their actions from \(a^\star_{1}, a^\star_{2}\) to another action.
📗 For general-sum games, \(R\left(a\right) = R\left(a_{1}, a_{2}, ..., a_{n}\right)\) is a vector where \(R_{i}\left(a_{i}, a_{-i}\right)\) is the reward for player \(i\) given player \(i\) uses \(a_{i}\) and the other players use \(a_{-i} = a_{1}, a_{2}, ..., a_{n}\) without \(a_{i}\). The Nash equilibrium is defined the same way as \(\pi = \left(a^\star_{1}, a^\star_{2}, ..., a^\star_{n}\right)\) where \(a^\star_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a^\star_{-i}\right)\), meaning no player would prefer to unilaterally change their actions from \(a^\star_{i}\) to another action. The value of the game is undefined since there could be multiple Nash equilibria.

📗 One way to find a (stable) Nash equilibrium is through best response dynamics, i.e start with arbitrary \(a_{1}, a_{2}, ..., a_{n}\) and iteratively update \(a_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a_{-i}\right)\).

# Game





# Output:

📗 Value:
📗 Player 1 best response: ,
📗 Player 2 best response: ,


# Settings:

📗 Zero-sum game:
📗 Reward Function 1: , 2:
or by
from to
📗 Strategy 1: , 2:
📗 Number of replays:






Last Updated: May 07, 2024 at 12:22 AM