Names

Jude Ogden, Jash Kadakia, Tony Zhao

Introduction

In modern American society, there is more attention given to professional football than to almost any other pastime or professional sport. No pressure is placed more on any part of the game than on the position of quarterback. Young quarterbacks are often expected to immediately elevate the team surrounding them, and NFL front offices have increasingly short patience with draft picks that they deem to be “busts”. In this analysis, we have set out to determine the impact of quarterback draft position on a quarterback’s career passing yards, in order to best inform the expectations of fans for quarterbacks as they are drafted into the NFL. In doing so, we will sort through data of all NFL draft picks after 1980.

How is a quarterback’s career passing yardage influenced by the pick with which he was taken in the NFL Draft?

This analysis finds that a statistically significant negative relationship exists between quarterback draft pick position and career passing yards.

Background

To complete this analysis, we first had to acquire a dataset detailing each NFL draft from 1980 onward, including individual player draft position and their subsequent NFL careers.

https://github.com/nflverse/nflverse-data/releases/tag/draft_picks

NFL Draft Data (draft_picks.csv)

This data was compiled by Pro Football Reference, an online company underneath the larger sports statistics organization Sports Reference. The original dataset lists every draft pick from 1980 to 2024, including a player’s name, university, age, team drafted to, and position. Career data is also available in this dataset, with each measurable statistic for a player (average rushing yards, passing yards, games played, etc.) being listed alongside long-term statistics, such as AP all-pro selections and Hall of Fame Status.

To complete our data analysis, we only kept two of the columns in this original dataset. First, we filtered the file to only keep players whose position was listed as “QB”. Once our modified data consisted of only quarterbacks, we selected the pick and pass_yards columns, our independent and dependent variables. Some quarterbacks had their passing yards listed as NA, as they either never suited up or never had a pass attempt in an NFL game. These players were removed from our data, but players with 0 career passing yards were kept. For a quarterback to have a pick value of 100 meant that they were the 100th overall selection in their respective draft. The value recorded in pass_yards is a quarterback’s cumulative passing yardage total over a career. A quarterback with a value of 10,000 in this column passed for 10,000 yards over their career.

Now that these two columns had been isolated and cleaned, in order to provide an appropriate plot for a linear regression analysis, we had to reduce the number of outputs (passing yards) to one per input (draft position). To do this, we grouped the dataset by pick, and created a third column, meanyardsthousands, to average the career passing yards of each NFL quarterback taken in the same draft slot. These averages were then divided by 1000 to make data visualization easier. This was the final dataset with which we created our figure and performed a linear regression analysis to determine any potential impact of quarterback draft position on career passing yards.

Plots

Figure 1 indicates a negative relationship between draft position and career passing yards, but further analysis is needed to determine the significance of this relationship. The small standard error ribbon indicates a well-fitted model.

Figure 2 indicates that a regression is appropriate for this analysis, as far as our class curriculum goes. The data appears to be evenly distributed, aside from a distinct cutoff point below 0 career passing yards. It is important to note that this cutoff point results in a funneling effect, introducing a shortcoming in our statistical analysis of this data. While this shortcoming is discussed more in our discussion, we still believe this analsyis to be the most valid method given our understanding of statistical models.

Analysis

Statistical Model

We model linear regressions as \[ Y_i = \beta_0 + \beta_1 * X + \varepsilon_i \text{, for } i = 1,...n \] \[ \text{where }\varepsilon_i \sim N(0, \sigma) \]

Where \(\beta_0\) is the predicated y-intercept of the model, \(\beta_1\) is the linear model’s slope, and \(\varepsilon_i\) is the error around each estimated career passing yard value by draft position.

The linear regression model relies on three implicit assumptions:

  1. The relationship between X and Y is actually linear, as opposed to a curved/non-linear relationship. - This assumption is met (Figure 2). Our values maintain a relatively constant variance from the mean.
  2. The errors are normally distributed around 0. - This assumption is met (Figure 2). There is a relatively even distribution of points on both sides of the mean.
  3. The errors have constant variance, which does not change with X. - This assumption is not met (Figure 2). As a result of a set minimum career passing yardage (0), a funneling effect is evident in our residuals plot, as the variance with respect to the mean becomes smaller and smaller.

While our third assumption for linear regression models is violated, we have continued to use this analysis for our study, given that it is the most valid method we have learned in this class. Our conclusions can provide strong evidence for a relationship, but cannot comment on whether or not this relationship is linear.

Hypotheses

Our null hypothesis takes the form: \(H_0: \beta_1 = 0\)

Our alternative hypothesis is: \(H_a: \beta_1 < 0\)

Using a linear regression test to provide potential evidence refusing the null hypothesis.

Regression Test

r = -0.430

This correlation coefficient indicates a moderately strong negative relationship between our two variables.

\(\beta_1\) = -0.515

This experimental slope means that, for each increasing draft position of a quarterback, the career passing yardage of that quarterback will lower by 515 yards.

p-value = \(4.82 * 10^{-11}\)

This p-value was obtained by calculating a two-sided p-value through the pt() function and dividing this value by two.

Interpretation

We find significant evidence that a negative relationship exists between quarterback draft position and career passing yards (\(p = 4.82 * 10^{-11}\), one-sided t-test).

Discussion

Further interpretation

Our analysis has shown a significant negative relationship between quarterback draft position and career passing yards. The later a quarterback is drafted, the fewer career passing yards they will have.

Shortcomings and Future Work

We conducted this analysis through the use of a linear regression, as this was the most accurate model for the data that we had explored in class. While the data can largely be fit well by a linear model, the regression plot shows a few assumptions that may fall short. Namely, the presence of a lower limit on career passing yards (0) results in a funneling effect as pick values increase. This effect is also reflected in the concentration of passing yards below the mean closer to the line of best fit.

The structure of the NFL and its draft are also confounding factors in this analysis. Quarterbacks are not the only position selected in the draft, and while data exists for nearly 50 years of draft selections, not all pick values have had a quarterback selected by which career statistics can be determined. Additionally, quarterbacks selected later in the draft are often never expected to play in the regular season; some serve as backups while others are cut before they can make it onto an NFL roster. Therefore, it is important to note that these results are not a reflection of quarterback skill, but more so a reflection on NFL roster construction and the role the draft plays in constructing teams.

For future analyses of these data, we recommend normalizing some of our shortcomings by accounting for career playing time. Does a quarterback’s yards per minute played change based on their draft position? We also were interested in specific teams, and whether or not late-round quarterback picks were successful in different organizations. There was also no accounting for a quarterback’s career arc in our analysis, further research must be done on how a player’s passing statistics change throughout multiple seasons, and even if quarterback passing statistics have changed over decades, as NFL offenses modernize and more attuned athletic training becomes available to players.

References

Ho, T. (2024). NFL Draft Picks and Player Career Statistics, 1980 - 2024. https://github.com/nflverse/nflverse-data/releases/tag/draft_picks.