Zero-sum Game with Unique Nash and Minimal Modification#
- game_redesign.RAP(policy_1, policy_2, original_reward, reward_bound=inf, value_lower_bound=-inf, value_upper_bound=inf, value_gap=None, perturbation_bound=None, perturbation_value=None, condition_number_bound=100)#
Find a zero-sum game
redesigned_reward
with(policy_1, policy_2)
being the unique Nash equilibrium. The game that is the closest tooriginal_reward
, measured by L1 norm, is returned iforiginal_reward
is notNone
.- Parameters:
policy_1 ((M,) array) – Target policy for player 1.
policy_1[i]
is the probability player 1 uses actioni
.policy_2 ((N,) array) – Target policy for player 2.
policy_2[j]
is the probability player 2 uses actionj
.original_reward ((M, N) array, optional) – The original reward matrix.
original_reward[i][j]
is the reward for player 1 when player 1 uses actioni
and player 2 uses actionsj
.reward_bound (float or inf, optional) – All entries of
redesigned_reward
will be in the interval[-reward_bound, reward_bound]
(the reward bound is symmetric since the game is zero-sum). The entries inoriginal_reward
are not required to be in this interval.value_lower_bound (float or inf, optional) – The lower bound for the Nash equilibrium value of
redesigned_reward
.value_upper_bound (float or inf, optional) – The upper bound for the Nash equilibrium value of
redesigned_reward
.value_gap (float or inf, optional) – The Nash equilibrium value of
redesigned_reward
will be at leastvalue_gap
larger than the value from actions that are not in the support ofpolicy_1
andpolicy_2
. Ifvalue_gap
isNone
, the default value of0.01 * numpy.norm(original_reward, inf)
is used. This is the \(\iota\) in [1] and [2].perturbation_value (float, optional) – The value of perturbation used. If
perturbation_value
isNone
, a uniform random number in the interval[-perturbation_bound, perturbation_bound]
will be used; otherwise, if the value supplied does not lead to a game with unique Nash equilibrium,InvalidPerturbationError
will be raised. This is the \(\epsilon\) in [1].perturbation_bound (float, optional) – The maximum absolute perturbation to be added to
redesigned_reward
. Theperturbation_value
will be a uniform random number in the interval[-perturbation_bound, perturbation_bound]
. Ifperturbation_bound
isNone
, the default value of0.01 * numpy.norm(original_reward, inf)
is used. This is the \(\lambda\) in [1].condition_number_bound (float, optional) – If the condition number of a matrix is larger than
condition_number_bound
, it is considered singular when checking the invertibility condition for Nash uniqueness.
- Returns:
The zero-sum game
redesigned_reward
with(policy_1, policy_2)
being the unique Nash equilibrium. The game that is the closest tooriginal_reward
, measured by L1 norm, is returned iforiginal_reward
is notNone
.- Return type:
(M, N) array
- Raises:
InvalidPolicyError – If
policy_1
andpolicy_2
are not probability distributions, this error will be raised.InvalidOriginalRewardError – If
policy_1
has M elements,policy_2
has N elements, butoriginal_reward
is not an M by N matrix, this error will be raised.UnequalSupportSizeError – If the sizes of the support of
policy_1
andpolicy_2
are not equal, the problem will be infeasible.ValueRangeError – If the Nash value range
[value_lower_bound, value_upper_bound]
and the interior of the adjusted reward value range(-reward_bound + value_gap + perturbation_bound, reward_bound - value_gap - perturbation_bound)
has an empty intersection, the problem will be infeasible.InvalidPerturbationError – If the
perturbation_value
is supplied but the resultingredesigned_reward
does not have a unique Nash equilibrium (value_gap
andcondition_number_bound
are used to check the SIISOW and INV conditions, see [1] for details), this error will be raised.
Notes#
Given
policy_1
(\(\mathbf{p}\) with support \(\mathcal{I} \subseteq \mathcal{A}_{1}\)),policy_2
(\(\mathbf{q}\) with support \(\mathcal{J} \subseteq \mathcal{A}_{2}\)),original_reward
(\(R^\circ\)),reward_bound
(\(b\)),value_lower_bound
(\(\overline{v}\)),value_upper_bound
(\(\underline{v}\)),value_gap
(\(\iota\)),perturbation_value
(\(\epsilon\)), andperturbation_bound
(\(\lambda\)), the outputredesigned_reward
(\(R^\dagger\)) is computed by,
Step 1 (Relax):
Step 2 (and Perturb):
Start with \(R^\dagger = R\) and while the condition number of \(\begin{bmatrix} R_{\mathcal{I} \mathcal{J}} & -{\mathbf{1}}_{\left| \mathcal{I} \right|} \\ {\mathbf{1}}^\top_{\left| \mathcal{J} \right|} & 0 \end{bmatrix}\) is larger than condition_number_bound
, repeat the following:
The \(R^{\text{eRPS}}\) is the perturbation step is the extended rock paper scissors game, see game_redesign.eRPS()
. Details of the algorithm can be found in [1].
If
perturbation_value
isNone
, the function uses a random perturbation and therefore returns a different game every time the function is called.
References#
Examples#
Example 1 from [1]: The two finger morra example.
game_redesign.RAP([7/12, 5/12], [7/12, 5/12], [[2, -3], [-3, 4]], float("inf"), 0, 0)
Example 2 from [1]: The rock paper scissors fire water example.
game_redesign.RAP([1/5, 1/5, 1/5, 1/5, 1/5], [1/5, 1/5, 1/5, 1/5, 1/5], [[0, -1, 1, -1, 1], [1, 0, -1, -1, 1], [-1, 1, 0, -1, 1], [1, 1, 1, 0, -1], [-1, -1, -1, 1, 0]], 1)
An example with constant zero original reward, so the redesigned reward matrix will be the extended rock paper scissors game scaled by
perturbation_value
.
game_redesign.RAP([1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [[0, 0, 0], [0, 0, 0], [0, 0, 0]], reward_bound = 1, perturbation_value = 0.01)
An example where
UnequalSupportSizeError
is raised since the supports ofpolicy_1
andpolicy_2
have different sizes.
game_redesign.RAP([1/3, 1/3, 1/3], [1/2, 0, 1/2], [[0, 0, 0], [0, 0, 0], [0, 0, 0]])
An example where
ValueRangeError
is raised since the adjusted reward value range is empty.
game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[1, 0], [0, 1]], reward_bound = 1, value_gap = 0.5, perturbation_bound = 0.5)
An example where
InvalidPerturbationError
is raised since theperturbation_value
is too small for SIISOW or INV conditions to be satisfied numerically based onvalue_gap
andcondition_number_bound
.
game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[0, 0], [0, 0]], value_gap = 0.2, perturbation_value = 0.1, condition_number_bound = 100)