Stat 992: Course Logistics and Prerequisites

Author

Hyunseung Kang

Published

April 3, 2024

Key Items from the Syllabus

Course website: Canvas and my homepage
Target audience: Ph.D. students in statistics
Office hours:
1. Walk-ins whenever I’m available (1245B Medical Sciences)
2. By appointment (E-mail: hyunseung@stat.wisc.edu)
Grading:
1. One assignment. Submit the assignment by May 3rd, 2024, 5:00pm Central.
2. See the course webpage for details.

Goal of the Course

The main goal is to prepare students for research in causal inference.

Build intuition behind causal inference (e.g. confounding, counterfactuals, missing data)
Learn how to identify causal estimands:
1. Under what conditions do we have \(\text{Causal Effect} = g(\text{observed data})\) for some function \(g\)?
2. Deals with population-level quantities (i.e. no randomness)
Learn how to estimate/infer causal estimands:
1. How should we estimate \(g\), ideally with minimal assumptions?
2. How should we test \(H_0: \text{Causal Effect} = 0\)?
3. Deals with randomness from sampling, experimental design, etc.
Learn how to conduct numerical evaluations for causal questions:
1. How do you simulate data for causal inference?
2. What empirical metrics should you be looking for? (e.g. covariate balance, overlap)

Probability Prerequisites (Non-Asymptotic)

You need to know probability at the level of an advanced statistics undergraduate student (e.g. Stat 309, Math/Stat 431, Ross (2010)).

Definition of conditional probability and conditional expectations
Conditional independence¹: If \(X \perp Y | Z\), then for any functions \(f\) and \(g\)
1. \(f(X) \perp g(Y) \mid Z\)
2. \(\mathbb{E}[f(X)g(Y)|Z] = \mathbb{E}[f(X)|Z]\mathbb{E}[g(Y)|Z]\)
3. \(\mathbb{E}[f(X)|Y,Z] = \mathbb{E}[f(X)|Z]\)
Law of total expectation:
1. \(\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X | Y]]\)
2. \(\mathbb{E}[X | Y] = \mathbb{E}[\mathbb{E}[X|Y,Z] | Y]\)

Probability Prerequisites (Asymptotic)

Limit theorems: For \(X_i \overset{\text{i.i.d.}}{\sim}F\) and \(F\) has finite mean and variance.
1. LLN: \(n^{-1} \sum_{i=1}^{n} X_i \overset{\rm P}{\to}\mathbb{E}[X_i]\)
2. CLT: \(n^{-1/2} \sum_{i=1}^{n} (X_i - \mathbb{E}[X_i]) \overset{\rm D}{\to}N(0,\sigma^2)\)
Continuous mapping theorem: For any continuous function \(f(\cdot)\), if \(X_n \overset{{\rm D} \text{ or } {\rm P}}{\longrightarrow}X\), then \(f(X_n) \overset{{\rm D} \text{ or } {\rm P}}{\longrightarrow}f(X)\).
Slutsky’s theorem: Let \(Y_n \overset{\rm P}{\to}c\) where \(c\) is a constant. If \(X_n \overset{{\rm D} \text{ or } {\rm P}}{\longrightarrow}X\), then \(X_nY_n \overset{{\rm D} \text{ or } {\rm P}}{\longrightarrow}Xc\) and \(X_n + Y_n \overset{{\rm D} \text{ or } {\rm P}}{\longrightarrow}X +c\).

Math Stats/Stat Methods Prerequisites

You need to know math stats at the level of an advanced statistics undergraduate (e.g. Stat 310). Ideally, you should know math stat at the level of Casella and Berger (2002).

Generalized linear models (e.g. linear models, logistic regression)
Maximum likelihood estimators (e.g. efficiency, Fisher information, Cramer-Rao)
Hypothesis testing (e.g. Wald test, likelihood ratio test)
Parametric and nonparametric two-sample tests (e.g. two-sample t-test, Wilcoxon signed rank test, permutation test, etc.)

My go-to reference books: Serfling (1980), Newey and McFadden (1994) (Sections 2,3,6), Lehmann (1999), Wooldridge (2010) (Chapters 1-5), and Van der Vaart (2000)

Computational Prerequisites

You should know some R.
You should know how to simulate data and empirically evaluate
1. Properties of estimators (i.e. bias, variance)
2. Properties of statistical tests (i.e. Type I error rate, power, coverage of confidence intervals)
You should know how to create reasonably informative plots or tables.

Other Prerequisites

Rates of convergence: \(X_n = O_p(n^{-1/2})\) versus \(X_n = o_p(n^{-1/2})\)
Chebyshev’s inequality, Cauchy-Schwartz inequality, and the triangle inequality
Taylor series approximation
Multivariable calculus and basic real analysis
1. Open/closed/compact sets
2. Inf/sup/liminf/limsup, norms
3. Definition of limits, continuous funciton, and derivative
Linear algebra
1. Linear span, column space, rank of a matrix, inverse, determinants
2. Orthogonal projections

My Go-To Reference Books

Serfling (1980), Lehmann (1999), and Lehmann (2006) (Appendix)
1. For CLTs/LLNs under finite and super-population setups
2. For properties of U statistics (i.e. rank tests)
3. Rates of convergence are described intuitively in Lehmann (1999).
Newey and McFadden (1994) (Sections 2,3,6)
1. For M-estimation with estimated nuisance parameter
Wooldridge (2010) (Chapters 1-5),
1. For deriving asymptotics of regression estimators
Van der Vaart (2000)
1. For semiparametric efficiency theory²
2. For properties of M estimators and empirical process theory.

References

Casella, George, and Roger L Berger. 2002. Statistical Inference. Duxbury press.

Dawid, A. P. 1979. “Conditional Independence in Statistical Theory.” Journal of the Royal Statistical Society. Series B (Methodological) 41 (1): 1–31.

Hines, Oliver, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. 2022. “Demystifying Statistical Learning Based on Efficient Influence Functions.” The American Statistician 76 (3): 292–304.

Kennedy, Edward H. 2022. “Semiparametric Doubly Robust Targeted Double Machine Learning: A Review.” arXiv Preprint arXiv:2203.06469.

Lehmann, Erich Leo. 1999. Elements of Large-Sample Theory. Springer.

———. 2006. Nonparametrics: Statistical Methods Based on Ranks. Springer.

Newey, Whitney K. 1990. “Semiparametric Efficiency Bounds.” Journal of Applied Econometrics 5 (2): 99–135.

Newey, Whitney K, and Daniel McFadden. 1994. “Large Sample Estimation and Hypothesis Testing.” Handbook of Econometrics 4: 2111–2245.

Ross, Sheldon. 2010. A First Course in Probability. 8th ed. Pearson.

Serfling, Robert J. 1980. Approximation Theorems of Mathematical Statistics. John Wiley & Sons.

Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.

Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. MIT press.

Footnotes

See Section 3.1 and Section 4 of Dawid (1979) for a concise list of implications arising from conditional independence.↩︎
There are now great references to this: Alejandro’s book, Hines et al. (2022), Kennedy (2022), and Newey (1990)↩︎

Other Formats