Zero-sum Game with Unique Nash and Minimal Modification ======================================================= .. autofunction:: game_redesign.RAP Notes ----- * The required packages include: `numpy `__ and `gurobi `__. * Given ``policy_1`` (:math:`\mathbf{p}` with support :math:`\mathcal{I} \subseteq \mathcal{A}_{1}`), ``policy_2`` (:math:`\mathbf{q}` with support :math:`\mathcal{J} \subseteq \mathcal{A}_{2}`), ``original_reward`` (:math:`R^\circ`), ``reward_bound`` (:math:`b`), ``value_lower_bound`` (:math:`\overline{v}`), ``value_upper_bound`` (:math:`\underline{v}`), ``value_gap`` (:math:`\iota`), ``perturbation_value`` (:math:`\epsilon`), and ``perturbation_bound`` (:math:`\lambda`), the output ``redesigned_reward`` (:math:`R^\dagger`) is computed by, Step 1 (Relax): .. math:: \displaystyle\min_{R,v} & \;\left\|R - R^\circ\right\|_{1} \\ s.t. & \;R_{\mathcal{I}\bullet} \mathbf{q} = v {\mathbf{1}}_{\left| \mathcal{I} \right|} \\ & \mathbf{p}^\top R_{\bullet\mathcal{J}} = v {\mathbf{1}}^\top_{\left| \mathcal{J} \right|} \\ & R_{\mathcal{A}_{1}\setminus\mathcal{I}\bullet} \mathbf{q} \leq \left(v - \iota\right) {\mathbf{1}}_{\left| \mathcal{A}_{1}\setminus\mathcal{I} \right|} \\ & \mathbf{p}^\top R_{\bullet \mathcal{A}_{2}\setminus\mathcal{J}} \geq \left(v + \iota\right) {\mathbf{1}}^\top_{\left| \mathcal{A}_{2}\setminus\mathcal{J} \right|} \\ & \underline{v} \leq v \leq \overline{v} \\ & -b+\lambda \leq R_{i j} \leq b-\lambda, \forall \left(i, j\right) \in \mathcal{A}. Step 2 (and Perturb): Start with :math:`R^\dagger = R` and while the condition number of :math:`\begin{bmatrix} R_{\mathcal{I} \mathcal{J}} & -{\mathbf{1}}_{\left| \mathcal{I} \right|} \\ {\mathbf{1}}^\top_{\left| \mathcal{J} \right|} & 0 \end{bmatrix}` is larger than ``condition_number_bound``, repeat the following: .. math:: \text{sample} & \;\epsilon \sim \text{Uniform}\left[-\lambda, \lambda\right], \\ & R^\dagger = R + \epsilon R^{\text{eRPS}}. The :math:`R^{\text{eRPS}}` is the perturbation step is the extended rock paper scissors game, see :py:func:`game_redesign.eRPS`. Details of the algorithm can be found in [1]_. * If ``perturbation_value`` is ``None``, the function uses a random perturbation and therefore returns a different game every time the function is called. References ---------- .. [1] Wu, Y., McMahan, J., Chen, Y., Chen, Y., Zhu, X., & Xie, Q. (2023). Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value. `arXiv `__. .. [2] Wu, Y., McMahan, J., Zhu, X., & Xie, Q. (2023). On faking a Nash equilibrium. `arXiv `__. Examples -------- (1) Example 1 from [1]_: The two finger morra example. .. code-block:: python game_redesign.RAP([7/12, 5/12], [7/12, 5/12], [[2, -3], [-3, 4]], float("inf"), 0, 0) (2) Example 2 from [1]_: The rock paper scissors fire water example. .. code-block:: python game_redesign.RAP([1/5, 1/5, 1/5, 1/5, 1/5], [1/5, 1/5, 1/5, 1/5, 1/5], [[0, -1, 1, -1, 1], [1, 0, -1, -1, 1], [-1, 1, 0, -1, 1], [1, 1, 1, 0, -1], [-1, -1, -1, 1, 0]], 1) (3) An example with constant zero original reward, so the redesigned reward matrix will be the extended rock paper scissors game scaled by ``perturbation_value``. .. code-block:: python game_redesign.RAP([1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [[0, 0, 0], [0, 0, 0], [0, 0, 0]], reward_bound = 1, perturbation_value = 0.01) (4) An example where ``UnequalSupportSizeError`` is raised since the supports of ``policy_1`` and ``policy_2`` have different sizes. .. code-block:: python game_redesign.RAP([1/3, 1/3, 1/3], [1/2, 0, 1/2], [[0, 0, 0], [0, 0, 0], [0, 0, 0]]) (5) An example where ``ValueRangeError`` is raised since the adjusted reward value range is empty. .. code-block:: python game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[1, 0], [0, 1]], reward_bound = 1, value_gap = 0.5, perturbation_bound = 0.5) (6) An example where ``InvalidPerturbationError`` is raised since the ``perturbation_value`` is too small for SIISOW or INV conditions to be satisfied numerically based on ``value_gap`` and ``condition_number_bound``. .. code-block:: python game_redesign.RAP([1/2, 1/2], [1/2, 1/2], [[0, 0], [0, 0]], value_gap = 0.2, perturbation_value = 0.1, condition_number_bound = 100)