Prev: CP3, Next: CP1

# These competitions are optional for CS540 students, unless you want to use it for the participation grades.


# CP4 Competition Instruction

📗 In this competition, you will implement the policy gradient algorithm to find and approximate an equilibrium to a Markov game. In particular, you are one of many players (influencers) whose house is at a fixed location in \(\left[0, 1\right] \times \left[0, 1\right]\) and your goal is to get the dog (receiver) to a location as close as possible to your house, while the other players with possibly different house locations want to do the same. In every round, the players move simultaneous to a location within a fixed radius of their current location. The dog will always move to the center (average location) of all players after every round.

📗 Submit a policy neural network to play against other students.

# Competition

 



Auto move:
Scores:

📗 Editor mode:
Note
📗 Number of influencers (2 - 5):
➩ Target positions:

📗 Number of receivers (1 - 5):
📗 Icon size:
📗 Green is valid positions for the influencers.
📗 Gray is invalid positions for the influencers (not the receivers).
➩ Vertices (add a line with at least two points to add a polygon):

➩ Drag red circles to move vertices.
➩ Click on the plus signs to add a point.
➩ Drag the green square to move all vertices.
➩ Drag the green circle to rotate the vertices.
➩ (Technical note: all polygons must be convex, and gray polygons should be non-overlapping and in the interior of the green polygons for the demo to work properly.)
📗 Players (influencer:target):
Note
📗 0 for human, 1 for enumeration, 2 for projected gradient descent
➩ enumeration size (total of \(n^{2}\) sample points): \(n\) = .
➩ projected gradient maximum steps: .
➩ projected gradient initial learning rate: \(\alpha\) = (uses \(\dfrac{\alpha}{\sqrt{t}}\) in iteration \(t\)).
📗 Influencer weights: \(v\) =
Note
📗 Influencer \(i\) maximizes \(\displaystyle\sum_{j} v_{i j} \left\|\hat{x}_{j} - t_{i}\right\|^{2}\) where \(\hat{x}_{j}\) is the final position of receiver \(j\) and \(t_{i}\) is the target position of influencer \(i\).
➩ Suppose \(\hat{x}_{j} = \displaystyle\sum_{i} w_{j i} x_{i}\), then projected gradient descent is \(\nabla_{x_{i}}\)  = \(2 \displaystyle\sum_{j} v_{i j} w_{j i} \left(\hat{x}_{j} - t_{i}\right)\).
📗 Receivers:
📗 Receiver weights: \(w\) =
Note
📗 Receiver \(j\) moves to position \(\hat{x}_{j} = \displaystyle\sum_{i} w_{j i} x_{i}\) where \(x_{i}\) is the position of influencer \(i\). \(w_{j \cdot}\) is not normailzed to sum up to 1.

# Submission

Your submission should contain (i) your player name (not necessarily your real name), (ii) your player icon (single emoji from this list: Link), (iii) your house location, (iv) your network to control the player.






Last Updated: March 06, 2026 at 3:28 PM