Instructor: Hyunseung Kang
Email: khyunsWHALE@whartonWHALE.upenn.edu (remove all marine mammals from the e-mail address)
Office: 434 JMHH
Office hours: Mon/Tues/Wed/Thur 12:15p.m. - 1:30p.m.
Syllabus: Syllabus

Updates

Course Overview

The course is aimed to equip students with the tools needed to analyze real-world data and to justify their use through mathematical theory. Together, we will study basic concepts related to statistical inference and examine commonly used methods, with an emphasis on understanding when and how to apply them. Students will also learn how use these methods on the statistical software R.

Prerequisites

The official prerequisite of the course is STAT 430. The effective prerequisite is fluency with basic probabilistic reasoning and analysis (e.g., probability distributions and densities; joint distributions; conditional probability, independence, correlation, and covariance; moment generating functions; law of large numbers; central limit theorem; etc.) For a refresher/overview of these topics, please refer to A First Course in Probability by Sheldon Ross.

It would be helpful to have previous exposure to linear algebra, but it is not required. Previous exposure to the statistical computing software R is also not required.

Textbook

There is no required textbook for this course. All course material will largely consist of taking the best parts of each textbook listed below and presented through lecture and lecture notes. However, if you wish to purchase a textbook, Devore is available at the Bookstore.

All the textbooks are on reserve at the Lippincott Library in Van Pelt.

Statistical Computing

The statistical computing software R (latest version) will be used in the course. It is free, and can be downloaded at the R-project website http://www.r-project.org . The website also contains a list of manuals for using the software. Basic usage of R will be illustrated in class and through sample codes posted on the course website. Again, no previous exposure to the software is required.

Grading Policy

Assignments will be handed out every Monday and will be due the following Monday before class begins . Weekly quizzes will be be given every Monday at the beginning of class. They will be 15 minutes long and will be based on the previous week's lectures and assignment.

Final Project (Due Thur, August 9th)

In the final project, students will analyze a real-world data set of their choosing using the tools learned from the class. The final project should focus on what statistical tools were used, whether the tools were appropriate in the setting, and why the tools were important in the analysis. Students may also develop new tools for analysis, as long as it is justified by theory.

Students may work in groups up to three people. Each group will submit a one-page, single-spaced, 12-point type, 1-inch margin, executive summary providing an overview of the project. Also, the group will submit a technical report containing the details of the group's analysis. Both documents must be in a single PDF file (no .txt, .doc, .docx, .tex, etc.). In the technical report, students are expected to provide some mathematical justification of their analysis and include relevant numerical analysis (e.g. p-values, t-tests, F-tests, etc.) of the data set.

If students are interested and if the quality of the analysis is exceptional, your instructor will help you get the final project published in an academic journal.

Also, here is a list of websites where you can obtain freely accessible data sets. This is only a small fraction of what's available online.

Lecture Notes

Homework

Quiz

Links