Stat 992: Causal Inference (Assignment)

Background

The goal of the assignment is to demonstrate your understanding of how to identify, estimate, and simulate a problem in causal inference. This is the only graded assignment for the course. You must submit the assignment by Friday, May 3rd, 2024 (5:00pm Central) to the course Canvas website.

Deliverables

You will be submitting a 2 to 5 page document that addresses the following questions:

What causal estimand are you interested in studying? Some examples include:
1. Average treatment effect (ATE)
2. Average treatment effect on the treated (ATT)
3. Complier/local average treatment effect (LATE)
4. Static, single-point optimal treatment regime (OTR)
5. Natural indirect/direct effects (NDEs; NIEs)
6. Longitudinal ATEs
How is the observed data generated? Some examples include:
1. A randomized controlled trial (RCT)
2. An observational study with or without strong ignorability (e.g. positivity/overlap violation; presence of unmeasured confounding)
3. An RCT and an observational study without strong ignorability
4. Repeated cross-sectional studies
5. Panel data or longitudinal study with censored outcome
6. Regression discontinuity designs
7. Staggered adoption designs
How do you identify the causal estimand with the observed data? You need to have the following components:
1. A clear statement of the assumptions in terms of potential outcomes
2. A formal proof of identification
3. A brief discussion about the plausibility of the assumptions in your setting.
How do you estimate or conduct hypothesis testing about the causal estimand? You can use any estimator/test of your choice (e.g. regression, IPW, MLE, M-estimation, machine learning estimator, permutation test, etc.). But, you need to have the following components:
1. A clear description of the estimator and/or the statistical test
2. For estimation: a formal proof that your estimator is consistent and asymptotically Normal (CAN)
3. For testing: a formal proof that your statistical test is consistent and has proper size control
4. Optional: Prove that your estimator is optimal (e.g. semiparametrically efficient; efficient among a class of linear estimators)
Conduct a small simulation study or a real data analysis to numerically demonstrate your estimator. Some quantities you can show in your numerical analysis include
1. For estimation: bias, variance
2. For testing: Type I error control, power, coverage rate of confidence intervals
3. For computation: compute time, memory usage

For question 5, you must include R code that replicates your numerical analysis. Also, I discourage you from using existing R packages/code that implements the estimator, purely for your learning sake. But, I recognize that not all of you have time and if you are in this situation, please feel free to use existing code/software and cite them. You should, however, generate your own code for the simulation study.

You can read a paper on causal inference and answer the questions above as part of reading/understanding the paper; this is highly encouraged.¹ Alternatively, you can study a question discussed in class and answer the questions above as long as you explain them in your own words. For example, if you decide to focus on studying the ATE from an observational study where strong ignorability holds, you should explain, in your own words, how to identify the ATE and estimate the ATE with a doubly robust estimator. Also, please cite relevant sources.

Finally, some problems in causal inference do not necessarily fit into the above framework (e.g. sensitivity analysis). If you plan to work in such areas, come talk to me and we can discuss alternative questions that are appropriate for the causal problem you want to address.

Grading

You can submit an R Markdown file, html file, or a PDF. You must submit (i) a single file that answers the questions above and (ii) relevant R code to Canvas (for PDF submissions).

You’ll be graded on (a) completeness and (b) accuracy. I view the assignment as a means to help you learn about causal inference and as such, it’s okay to have a couple of mistakes (e.g. some parts of the proof are wrong; some assumptions are not stated correctly; code has bugs). In general, I expect most of you to get an A. But, some reasons for not getting an A include (but are not limited to):

You didn’t do the assignment.
You didn’t answer all of the questions above.
The document is difficult to read (e.g. not organized, horrible plots/tables, issues with spelling/grammar/style).
You did the questions, but there are several, big mistakes.
You violated the standards for academic integrity.

FAQs (Last Update: Apr. 3, 2024)

Can I discuss the assignment with other people in the class?
1. Certainly! But, you must turn in your own work and follow the standards for academic integrity.
Can I use my ongoing research project as part of completing the assignment?
1. Yes.
Can the assignment be longer than five pages?
1. Yes. But, going over 10 pages will make it difficult for me to read by the grading deadline (i.e. 10 days after the due date).
Can I use Python instead of R? Can I use Jupyter Notebooks or Quarto instead of R Markdown?
1. Yes.
Do you have some list of papers?
1. I would start with the references in the lecture notes. If they do not interest you, come talk to me.

These are the five questions that I think about when I read/review causal papers.↩︎

Stat 992: Causal Inference (Assignment)

Hyunseung Kang

Background

Deliverables

Grading

FAQs (Last Update: Apr. 3, 2024)