Stat 992: Course Logistics and Overview of Prerequisites

Author

Hyunseung Kang

Published

January 21, 2025

Key Items from the Syllabus

Course website: My homepage (Stat 992, Spring 2025)
Target audience: Ph.D. students in statistics
Office hours:
1. Walk-ins whenever I’m available (1245B Medical Sciences)
2. By appointment (e-mail me)
Grading:
1. Summarize one paper; see syllabus

Goal of the Course

The main goal is to prepare students for research in causal inference.

Build intuition behind causal inference (e.g. confounding, counterfactuals, missing data)
Learn how to identify causal estimands:
Learn how to estimate/infer causal estimands:

Prerequisites

You need to know probability, math stats, linear models, and statistical computing at the graduate level (e.g., Casella and Berger (2002), Lehmann (1999), Rizzo (2019)).

Specific concepts are listed below

A. Probability: conditional independence/probability/expectation, law of total expectation, convergence of random variables (e.g., LLN, CLT), rates of convergence, continuous mapping theorem

B. Math Stats: likelihood-based estimation and inference (e.g., MLE, Cramer-Rao), nonparametric two-sample tests (e.g., t-test, permutation test)

C. Linear Models: linear models (projection-based), GLMs

D. Computation: bootstrap, cross validation, designing simulations to numerically evaluate estimators and tests (e.g., bias, Type I error rate)

My Go-To Reference Books

Serfling (1980), Lehmann (1999), and Lehmann (2006) (Appendix)
1. For CLTs/LLNs under finite and super-population setups
2. For properties of U statistics (i.e. rank tests)
3. Rates of convergence are described intuitively in Lehmann (1999).
Newey and McFadden (1994) (Sections 2,3,6)
1. M-estimation with estimated nuisance parameter
Wooldridge (2010) (Chapters 1-5),
1. Deriving asymptotic properties of regression estimators
Van der Vaart (2000)
1. For semiparametric efficiency theory¹
2. For properties of M estimators and empirical process theory.

Other Prerequisites

Cauchy-Schwartz inequality, and the triangle inequality
Taylor series approximation
Multivariable calculus and basic real analysis
1. Open/closed/compact sets
2. Inf/sup/liminf/limsup, norms
3. Definition of limits, continuous funciton, and derivative
Linear algebra
1. Linear span, column space, rank of a matrix, inverse, determinants
2. Orthogonal projections

References

Casella, George, and Roger L Berger. 2002. Statistical Inference. Duxbury press.

Hines, Oliver, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. 2022. “Demystifying Statistical Learning Based on Efficient Influence Functions.” The American Statistician 76 (3): 292–304.

Kennedy, Edward H. 2022. “Semiparametric Doubly Robust Targeted Double Machine Learning: A Review.” arXiv Preprint arXiv:2203.06469.

Lehmann, Erich Leo. 1999. Elements of Large-Sample Theory. Springer.

———. 2006. Nonparametrics: Statistical Methods Based on Ranks. Springer.

Newey, Whitney K. 1990. “Semiparametric Efficiency Bounds.” Journal of Applied Econometrics 5 (2): 99–135.

Newey, Whitney K, and Daniel McFadden. 1994. “Large Sample Estimation and Hypothesis Testing.” Handbook of Econometrics 4: 2111–2245.

Rizzo, Maria L. 2019. Statistical Computing with r. Chapman; Hall/CRC.

Serfling, Robert J. 1980. Approximation Theorems of Mathematical Statistics. John Wiley & Sons.

Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.

Wooldridge, Jeffrey. 2010. Econometric Analysis of Cross Section and Panel Data. MIT press.

Footnotes

There are now great references to this: Alejandro’s book, Hines et al. (2022), Kennedy (2022), and Newey (1990)↩︎

Other Formats