Group 317-5: Stephen Ling, Lewis Clay Ballard, Hongwei Tian, Chester Zhang
Introduction
Hockey is a sport in which two teams play against each other by trying to maneuver a ball or a puck into the opponent’s goal using a hockey stick. We chose to focus on hockey because we were all interested in sports and it is easy to obtain a comprehensive data frame of different sports leagues. In this project, we plan to answer the questions: is there a trend in the birthdays of NHL players? Will taking more shots in a competition increase the goal percentage? What is the relationship between penalty minutes of NHL players and their age? What are the body features (Weight, Height, BMI) of NHL players in different positions? Does the experience in the league relate to the goal percentage of NHL players?
Based on our knowledge of Hockey (before analyzing the data), we expect there is a trend in birthdays of NHL players that the number of players increases when birthdays of players become closer to January in a year; taking more shots in a competition is associated with the goal percentage; the younger players tend to receive more penalty minutes; there are some outstanding body features of NHL players at different positions; finally, experience in the league is associated with the goal percentage of NHL players.
In general, there are significant trends in the features (birthday, BMI, height, weight) of NHL players, and some factors have a strong correlation with NHL players’ performances, but some factors do not.
Background
Data
The data set is collected by a Hockey fan on Kaggle. The major source of data is from different sports websites about Hockey. The data frame contains 40 columns and 27319 rows, including 40 variables and 3,340 players in National Hockey League (NHL) from 1976 to 2020. We use 14 variables from the data set:
Name
, name of the player.
Date_of_birth
, the player’s birthday.
Goals
, number of goals the player made in each season year.
Assists
, number of assists the player made in each season year.
Points
, points = goals + assists, which measures a player’s general performance in each season year.
Penalty_Minutes
, the total minutes a player spends in the penalty box.
Shots_on_Goal
, the number of shots player takes in each season year.
Shooting_Percentage
, shooting percentage = goals / shots on goal, which measures the goal ratio of the player in each season year.
Position
, the position the player plays at each season year, including Center, Defense, Forward, Goaltender, Left Wing, Right Wing.
Height
, the measured height of the player in each season year.
Weigth
, the measured weight of the player in each season year.
Body_Mass_Index
, the measured BMI of the player in each season year.
Age
, the age of the player in each season year.
Experience
, the year(s) the player has been in the NHL.
Time_on_Ice_per_Game
, the time (in the form “minute:second”) that the player is on the ice per game in each season year.
Source of Data
Background Information
To help better understand our analysis of data, we would like to illustrate some terms in background information part.
- The National Hockey League (NHL) is an organization of professional ice hockey teams in North America, formed in 1917. The NHL became the strongest league in North America in 1926.
- Body Mass Index (BMI) is a person’s weight in kilograms divided by the square of height in meters. A high BMI can be an indicator of high body fatness.
- General Rules of Hockey: Hockey players can only hit the puck with their stick. A goal can only be scored either from a field goal, a powerplay (caused by an opponent’s penalty), or from a penalty shot. Hockey players may not trip, push, charge, interfere with, or excessively physically handle an opponent in any way.
- A penalty minute is a punishment in hockey for an infringement of the rules. A player cannot participate in the match for a certain amount of time, depending on the severity of the infraction, and most penalty minutes are caused by physical conflict during the match.
Unusual Influencing Factors
- There are missing values in the data, and even though we drop missing values during analysis, this may still influence our interpretation of results.
- Some players play very little or not at all during a season year, which is an unusual influencing factor: these outliers may affect the distribution of data, which may affect our interpretation of results.
- The team environment is also an unusual influencing factor: a vigorous team may help players unlock their potential, and a passive team may affect players’ performance negatively.
- Finally, some outliers in scoring goals, weight, height, and BMI will also affect the distribution of the whole data set, which may affect our regression analysis and interpretation of body features of NHL players.
Focuses
- We have two general focuses in our analysis of the data. The first focus is the features of NHL players, including birthdays and body features (weight, height, BMI). The second focus is exploring factors that influence NHL players’ performance, including shooting rate vs. goal percentage and experience of NHL players vs. goal percentage.
Analysis
Trend in Birthdays of NHL Players
- We were inspired to explore the birthdays of NHL players because a statistics book titled Outliers mentions that Canadian hockey players’ birthdays are greatly affected by the age limitation of hockey player selection. So, we first transform the column
Date_of_birth
from the character into the date format. Then, we group the data frame by players’ names and birthdays and summarize the count of birthdays. Finally, we create a histogram on the Month of players’ birthdays.

- From the histogram we created, it is clear that the number of players who were born in January is over 350 (highest), and there is a significant decreasing trend in players’ birth month from January to August. So, it is reasonable to conclude there is a trend in birthdays of NHL players that the number of players increases when birthdays become closer to January.
Body Features of NHL Players
- For the body features of NHL players, we decided to explore the height, weight, and BMI in five positions including Center, Defense, Forward, Goaltender, Left-Wing, and Right-Wing. We first group the data by position. Then, we create boxplots on body features faceted by players’ positions.

- Over 75% of NHL players’ heights are over 180cm, which is almost 5 foot 11 inches (2 inches above the average height for a US male). The median heights of Center, Forward, Goaltender, Left Wing, Right Wing are similar and close to 185cm. The median height of Defense players is about the same as the upper quartile (the upper 25% of their heights) of other positions. This indicates that players in the Defense position have a higher distribution of heights and are therefore more likely to be taller than other positions.

- The weights of most NHL players are similar, with the medians for all positions being in the range of 85-95 kg or 187-210 lbs, which is in the range of the average male weight (197 lbs). While there isn’t too much variation by position, the median weights for Center, Forward, and Goaltender players are around the 3rd quartile (the lower 25% of their weights) of the rest of the positions’ (Defense, Left-Wing, Right-Wing) weight distributions, showing that players at Defense, Left-Wing, Right-Wing are mo