# Game Theory
📗 If there are multiple agents and the reward for one agent depends on the actions taken by all other agents, then these agents are participating in a game.
📗 For zero-sum games, if player 1 gets \(R\left(a\right) = R\left(a_{1}, a_{2}\right)\), then player 2 gets \(- R\left(s, a\right)\). The value of the game is defined as the \(V = \displaystyle\max_{a_{1}} \displaystyle\min_{a_{2}} R\left(a_{1}, a_{2}\right) = \displaystyle\min_{a_{2}} \displaystyle\max_{a_{1}} R\left(a_{1}, a_{2}\right)\), the value is unique due to the Minimax Theorem. The policy \(\pi = \left(a^\star_{1}, a^\star_{2}\right)\) that achieves the value \(V\) is called a Nash equilibrium, which is defined as the solution to \(a^\star_{1} = \mathop{\mathrm{argmax}}_{a_{1}} R\left(a_{1}, a^\star_{2}\right)\) and \(a^\star_{2} = \mathop{\mathrm{argmin}}_{a_{2}} R\left(a^\star_{1}, a_{2}\right)\), meaning both player 1 and player 2 would not prefer to unilaterally change their actions from \(a^\star_{1}, a^\star_{2}\) to another action.
📗 For general-sum games, \(R\left(a\right) = R\left(a_{1}, a_{2}, ..., a_{n}\right)\) is a vector where \(R_{i}\left(a_{i}, a_{-i}\right)\) is the reward for player \(i\) given player \(i\) uses \(a_{i}\) and the other players use \(a_{-i} = a_{1}, a_{2}, ..., a_{n}\) without \(a_{i}\). The Nash equilibrium is defined the same way as \(\pi = \left(a^\star_{1}, a^\star_{2}, ..., a^\star_{n}\right)\) where \(a^\star_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a^\star_{-i}\right)\), meaning no player would prefer to unilaterally change their actions from \(a^\star_{i}\) to another action. The value of the game is undefined since there could be multiple Nash equilibria.
📗 One way to find a (stable) Nash equilibrium is through best response dynamics, i.e start with arbitrary \(a_{1}, a_{2}, ..., a_{n}\) and iteratively update \(a_{i} = \mathop{\mathrm{argmax}}_{a_{i}} R\left(a_{i}, a_{-i}\right)\). Hide-and-Seek:
Link