Jude Ogden, Jash Kadakia, Tony Zhao
In modern American society, there is more attention given to professional football than to almost any other pastime or professional sport. No pressure is placed more on any part of the game than on the position of quarterback. Young quarterbacks are often expected to immediately elevate the team surrounding them, and NFL front offices have increasingly short patience with draft picks that they deem to be “busts”. In this analysis, we have set out to determine the impact of quarterback draft position on a quarterback’s career passing yards, in order to best inform the expectations of fans for quarterbacks as they are drafted into the NFL. In doing so, we will sort through data of all NFL draft picks after 1980.
How is a quarterback’s career passing yardage influenced by the pick with which he was taken in the NFL Draft?
This analysis finds that a statistically significant negative relationship exists between quarterback draft pick position and career passing yards.
To complete this analysis, we first had to acquire a dataset detailing each NFL draft from 1980 onward, including individual player draft position and their subsequent NFL careers.
https://github.com/nflverse/nflverse-data/releases/tag/draft_picks
This data was compiled by Pro Football Reference, an online company underneath the larger sports statistics organization Sports Reference. The original dataset lists every draft pick from 1980 to 2024, including a player’s name, university, age, team drafted to, and position. Career data is also available in this dataset, with each measurable statistic for a player (average rushing yards, passing yards, games played, etc.) being listed alongside long-term statistics, such as AP all-pro selections and Hall of Fame Status.
To complete our data analysis, we only kept two of the columns in
this original dataset. First, we filtered the file to only keep players
whose position
was listed as “QB”. Once our modified data
consisted of only quarterbacks, we selected the pick
and
pass_yards
columns, our independent and dependent
variables. Some quarterbacks had their passing yards listed as
NA
, as they either never suited up or never had a pass
attempt in an NFL game. These players were removed from our data, but
players with 0 career passing yards were kept. For a quarterback to have
a pick
value of 100 meant that they were the 100th overall
selection in their respective draft. The value recorded in
pass_yards
is a quarterback’s cumulative passing yardage
total over a career. A quarterback with a value of 10,000 in this column
passed for 10,000 yards over their career.
Now that these two columns had been isolated and cleaned, in order to
provide an appropriate plot for a linear regression analysis, we had to
reduce the number of outputs (passing yards) to one per input (draft
position). To do this, we grouped the dataset by pick
, and
created a third column, meanyardsthousands
, to average the
career passing yards of each NFL quarterback taken in the same draft
slot. These averages were then divided by 1000 to make data
visualization easier. This was the final dataset with which we created
our figure and performed a linear regression analysis to determine any
potential impact of quarterback draft position on career passing
yards.
Figure 1 indicates a negative relationship between draft position and career passing yards, but further analysis is needed to determine the significance of this relationship. The small standard error ribbon indicates a well-fitted model.
Figure 2 indicates that a regression is appropriate for this analysis, as far as our class curriculum goes. The data appears to be evenly distributed, aside from a distinct cutoff point below 0 career passing yards. It is important to note that this cutoff point results in a funneling effect, introducing a shortcoming in our statistical analysis of this data. While this shortcoming is discussed more in our discussion, we still believe this analsyis to be the most valid method given our understanding of statistical models.
We model linear regressions as \[ Y_i = \beta_0 + \beta_1 * X + \varepsilon_i \text{, for } i = 1,...n \] \[ \text{where }\varepsilon_i \sim N(0, \sigma) \]
Where \(\beta_0\) is the predicated y-intercept of the model, \(\beta_1\) is the linear model’s slope, and \(\varepsilon_i\) is the error around each estimated career passing yard value by draft position.
The linear regression model relies on three implicit assumptions:
While our third assumption for linear regression models is violated, we have continued to use this analysis for our study, given that it is the most valid method we have learned in this class. Our conclusions can provide strong evidence for a relationship, but cannot comment on whether or not this relationship is linear.
Our null hypothesis takes the form: \(H_0: \beta_1 = 0\)
Our alternative hypothesis is: \(H_a: \beta_1 < 0\)
Using a linear regression test to provide potential evidence refusing the null hypothesis.
r = -0.430
This correlation coefficient indicates a moderately strong negative relationship between our two variables.
\(\beta_1\) = -0.515
This experimental slope means that, for each increasing draft position of a quarterback, the career passing yardage of that quarterback will lower by 515 yards.
p-value = \(4.82 * 10^{-11}\)
This p-value was obtained by calculating a two-sided p-value through
the pt()
function and dividing this value by two.
We find significant evidence that a negative relationship exists between quarterback draft position and career passing yards (\(p = 4.82 * 10^{-11}\), one-sided t-test).
Our analysis has shown a significant negative relationship between quarterback draft position and career passing yards. The later a quarterback is drafted, the fewer career passing yards they will have.
We conducted this analysis through the use of a linear regression, as this was the most accurate model for the data that we had explored in class. While the data can largely be fit well by a linear model, the regression plot shows a few assumptions that may fall short. Namely, the presence of a lower limit on career passing yards (0) results in a funneling effect as pick values increase. This effect is also reflected in the concentration of passing yards below the mean closer to the line of best fit.
The structure of the NFL and its draft are also confounding factors in this analysis. Quarterbacks are not the only position selected in the draft, and while data exists for nearly 50 years of draft selections, not all pick values have had a quarterback selected by which career statistics can be determined. Additionally, quarterbacks selected later in the draft are often never expected to play in the regular season; some serve as backups while others are cut before they can make it onto an NFL roster. Therefore, it is important to note that these results are not a reflection of quarterback skill, but more so a reflection on NFL roster construction and the role the draft plays in constructing teams.
For future analyses of these data, we recommend normalizing some of our shortcomings by accounting for career playing time. Does a quarterback’s yards per minute played change based on their draft position? We also were interested in specific teams, and whether or not late-round quarterback picks were successful in different organizations. There was also no accounting for a quarterback’s career arc in our analysis, further research must be done on how a player’s passing statistics change throughout multiple seasons, and even if quarterback passing statistics have changed over decades, as NFL offenses modernize and more attuned athletic training becomes available to players.
Ho, T. (2024). NFL Draft Picks and Player Career Statistics, 1980 - 2024. https://github.com/nflverse/nflverse-data/releases/tag/draft_picks.