NBA Players
I chose to rank NBA players to begin with because the datasets for sports are relatively easy to access and basketball was chosen essentially at random.
It turns out that despite the data being easy to get, the data is not clean. Unfortunately, the methodology described is quite sensitive to bad data; it can be thought of as a least squares regression model with some false extrema given that would throw the entire fitted model off. I made the model more resistant to this and cleaned the data, which greatly increased the quality of the results (from an eye test), but there’s quite a ways to go. Nonetheless, I feel that the data quality has reached a point where the methodology can be seen as working.
One interesting thing to note is that most players seem to have a score below 0 and the distribution is slightly positively skewed with lower downside kurtosis. I didn’t include the legend as the graph has essentially every player on the graph and the legend is essentially useless.
This methodology can be seen as measuring how valuable each player is to their team or how hard it would be to replace that player for that team. For example, we see Kawhi Leonard doing very well for his team at the start of the graph - this can be interpreted as meaning Kawhi contributed a significant amount to his team’s success.
Detailing the upside results a little more, the Utah Jazz’s Donovan Mitchell, the Milwaukee Bucks’ Giannis Antetokounmpo seem to have had historically good 2021 seasons; winning the playoffs seem to have pushed Antetokounmpo’s score to historical heights and Mitchell seems to just be a one-man army. The Philadelphia 76ers’ Joel Embiid seems to do well, although the most surprising result is that the first non-superstar player is the Milwaukee Bucks’ Khris Middleton, who had an exceptional year compared to his average (I suspect this greatly contributed to the Bucks winning the season). Detailing the downside results, Jaylen Hoard and Lucas Nogueira should be given quite a bit more practice time before being allowed to step on the court for teams that aren’t trying to tank. The most well-known athlete with a low score was Rondae Hollis-Jefferson, who was rated as closer to the aforementioned downside group than to 0.
These results are not perfect because there is still quite a bit of cleaning to be done to the data, which I suspect will make the results near perfect. However, I hope this is enough to give you a general sense of how the methodology works, its general use cases, and effectiveness. I plan to use this data to make a player value forecaster at some point, although I suspect this will be more trivial than creating the unbiased dataset. If you have any questions or suggestions (ex. for better data sources), feel free to reach out over email.