Young Wu's Homepage

Prev: CP3, Next: CP1

# These competitions are optional for CS540 students, unless you want to use it for the participation grades.

# CP4 Competition Instruction

📗 In this competition, you will implement the policy gradient algorithm to find and approximate an equilibrium to a Markov game. In particular, you are one of many players (influencers) whose house is at a fixed location in \(\left[0, 1\right] \times \left[0, 1\right]\) and your goal is to get the dog (receiver) to a location as close as possible to your house, while the other players with possibly different house locations want to do the same. In every round, the players move simultaneous to a location within a fixed radius of their current location. The dog will always move to the center (average location) of all players after every round.

📗 Submit a policy neural network to play against other students.

# Competition

Auto move:
Scores:

📗 Editor mode:

Note

📗 Number of influencers (2 - 5):

➩ Target positions:

📗 Number of receivers (1 - 5):

📗 Icon size:

📗 Green is valid positions for the influencers.

📗 Gray is invalid positions for the influencers (not the receivers).

➩ Vertices (add a line with at least two points to add a polygon):

➩ Drag red circles to move vertices.

➩ Click on the plus signs to add a point.

➩ Drag the green square to move all vertices.

➩ Drag the green circle to rotate the vertices.

➩ (Technical note: all polygons must be convex, and gray polygons should be non-overlapping and in the interior of the green polygons for the demo to work properly.)

📗 Players (influencer:target):

Note

📗 0 for human, 1 for enumeration, 2 for projected gradient descent

➩ enumeration size (total of \(n^{2}\) sample points): \(n\) = .

➩ projected gradient maximum steps: .

➩ projected gradient initial learning rate: \(\alpha\) = (uses \(\dfrac{\alpha}{\sqrt{t}}\) in iteration \(t\)).

📗 Influencer weights: \(v\) =

Note

📗 Influencer \(i\) maximizes \(\displaystyle\sum_{j} v_{i j} \left\|\hat{x}_{j} - t_{i}\right\|^{2}\) where \(\hat{x}_{j}\) is the final position of receiver \(j\) and \(t_{i}\) is the target position of influencer \(i\).

➩ Suppose \(\hat{x}_{j} = \displaystyle\sum_{i} w_{j i} x_{i}\), then projected gradient descent is \(\nabla_{x_{i}}\) = \(2 \displaystyle\sum_{j} v_{i j} w_{j i} \left(\hat{x}_{j} - t_{i}\right)\).

📗 Receivers:

📗 Receiver weights: \(w\) =

Note

📗 Receiver \(j\) moves to position \(\hat{x}_{j} = \displaystyle\sum_{i} w_{j i} x_{i}\) where \(x_{i}\) is the position of influencer \(i\). \(w_{j \cdot}\) is not normailzed to sum up to 1.

# Submission

Your submission should contain (i) your player name (not necessarily your real name), (ii) your player icon (single emoji from this list: Link), (iii) your house location, (iv) your network to control the player.

Last Updated: June 27, 2026 at 9:07 PM