Zero-sum Game with Unique Nash and Minimal Modification
=======================================================
.. autofunction:: game_redesign.RAP
Notes
-----
* The required packages include: `numpy `__ and `gurobi `__.
* Given ``policy_1`` (:math:`\mathbf{p}` with support :math:`\mathcal{I} \subseteq \mathcal{A}_{1}`), ``policy_2`` (:math:`\mathbf{q}` with support :math:`\mathcal{J} \subseteq \mathcal{A}_{2}`), ``original_reward`` (:math:`R^\circ`), ``reward_bound`` (:math:`b`), ``value_lower_bound`` (:math:`\overline{v}`), ``value_upper_bound`` (:math:`\underline{v}`), ``value_gap`` (:math:`\iota`), ``perturbation_value`` (:math:`\epsilon`), and ``perturbation_bound`` (:math:`\lambda`), the output ``redesigned_reward`` (:math:`R^\dagger`) is computed by,
Step 1 (Relax):
.. math::
\displaystyle\min_{R,v} & \;\left\|R - R^\circ\right\|_{1}
\\ s.t. & \;R_{\mathcal{I}\bullet} \mathbf{q} = v {\mathbf{1}}_{\left| \mathcal{I} \right|}
\\ & \mathbf{p}^\top R_{\bullet\mathcal{J}} = v {\mathbf{1}}^\top_{\left| \mathcal{J} \right|}
\\ & R_{\mathcal{A}_{1}\setminus\mathcal{I}\bullet} \mathbf{q} \leq \left(v - \iota\right) {\mathbf{1}}_{\left| \mathcal{A}_{1}\setminus\mathcal{I} \right|}
\\ & \mathbf{p}^\top R_{\bullet \mathcal{A}_{2}\setminus\mathcal{J}} \geq \left(v + \iota\right) {\mathbf{1}}^\top_{\left| \mathcal{A}_{2}\setminus\mathcal{J} \right|}
\\ & \underline{v} \leq v \leq \overline{v}
\\ & -b+\lambda \leq R_{i j} \leq b-\lambda, \forall \left(i, j\right) \in \mathcal{A}.
Step 2 (and Perturb):
Start with :math:`R^\dagger = R` and while the condition number of :math:`\begin{bmatrix} R_{\mathcal{I} \mathcal{J}} & -{\mathbf{1}}_{\left| \mathcal{I} \right|} \\ {\mathbf{1}}^\top_{\left| \mathcal{J} \right|} & 0 \end{bmatrix}` is larger than ``condition_number_bound``, repeat the following:
.. math::
\text{sample} & \;\epsilon \sim \text{Uniform}\left[-\lambda, \lambda\right],
\\ & R^\dagger = R + \epsilon R^{\text{eRPS}}.
The :math:`R^{\text{eRPS}}` is the perturbation step is the extended rock paper scissors game, see :py:func:`game_redesign.eRPS`. Details of the algorithm can be found in [1]_.
* If ``perturbation_value`` is ``None``, the function uses a random perturbation and therefore returns a different game every time the function is called.
References
----------
.. [1] Wu, Y., McMahan, J., Chen, Y., Chen, Y., Zhu, X., & Xie, Q. (2023). Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value. `arXiv `__.
.. [2] Wu, Y., McMahan, J., Zhu, X., & Xie, Q. (2023). On faking a Nash equilibrium. `arXiv `__.
Examples
--------
(1) Example 1 from [1]_: The two finger morra example.
.. code-block:: python
game_redesign.RAP([7/12, 5/12], [7/12, 5/12], [[2, -3], [-3, 4]], float("inf"), 0, 0)
(2) Example 2 from [1]_: The rock paper scissors fire water example.
.. code-block:: python
game_redesign.RAP([1/5, 1/5, 1/5, 1/5, 1/5], [1/5, 1/5, 1/5, 1/5, 1/5], [[0, -1, 1, -1, 1], [1, 0, -1, -1, 1], [-1, 1, 0, -1, 1], [1, 1, 1, 0, -1], [-1, -1, -1, 1, 0]], 1)
(3) An example with constant zero original reward, so the redesigned reward matrix will be the extended rock paper scissors game scaled by ``perturbation_value``.
.. code-block:: python
game_redesign.RAP([1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [[0, 0, 0], [0, 0, 0], [0, 0, 0]], reward_bound = 1, perturbation_value = 0.01)
(4) An example where ``UnequalSupportSizeError`` is raised since the supports of ``policy_1`` and ``policy_2`` have different sizes.
.. code-block:: python
game_redesign.RAP([1/3, 1/3, 1/3], [1/2, 0, 1/2], [[0, 0, 0], [0, 0, 0], [0, 0, 0]])
(5) An example where ``ValueRangeError`` is raised since the adjusted reward value range is empty.
.. code-block:: python
game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[1, 0], [0, 1]], reward_bound = 1, value_gap = 0.5, perturbation_bound = 0.5)
(6) An example where ``InvalidPerturbationError`` is raised since the ``perturbation_value`` is too small for SIISOW or INV conditions to be satisfied numerically based on ``value_gap`` and ``condition_number_bound``.
.. code-block:: python
game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[0, 0], [0, 0]], value_gap = 0.2, perturbation_value = 0.1, condition_number_bound = 100)