Background
The goal of the assignment is to demonstrate your understanding of
how to identify, estimate, and simulate a problem in causal inference.
This is the only graded assignment for the course. You must submit the
assignment by Friday, May 3rd, 2024 (5:00pm Central) to
the course Canvas website.
Deliverables
You will be submitting a 2 to 5 page document that addresses the
following questions:
- What causal estimand are you interested in
studying? Some examples include:
- Average treatment effect (ATE)
- Average treatment effect on the treated (ATT)
- Complier/local average treatment effect (LATE)
- Static, single-point optimal treatment regime (OTR)
- Natural indirect/direct effects (NDEs; NIEs)
- Longitudinal ATEs
- How is the observed data generated? Some examples
include:
- A randomized controlled trial (RCT)
- An observational study with or without strong ignorability
(e.g. positivity/overlap violation; presence of unmeasured
confounding)
- An RCT and an observational study without strong ignorability
- Repeated cross-sectional studies
- Panel data or longitudinal study with censored outcome
- Regression discontinuity designs
- Staggered adoption designs
- How do you identify the causal estimand with the
observed data? You need to have the following components:
- A clear statement of the assumptions in terms of potential
outcomes
- A formal proof of identification
- A brief discussion about the plausibility of the assumptions in your
setting.
- How do you estimate or conduct hypothesis testing
about the causal estimand? You can use any estimator/test of your choice
(e.g. regression, IPW, MLE, M-estimation, machine learning estimator,
permutation test, etc.). But, you need to have the following components:
- A clear description of the estimator and/or the statistical
test
- For estimation: a formal proof that your estimator is consistent and
asymptotically Normal (CAN)
- For testing: a formal proof that your statistical test is consistent
and has proper size control
- Optional: Prove that your estimator is optimal
(e.g. semiparametrically efficient; efficient among a class of linear
estimators)
- Conduct a small simulation study or a real
data analysis to numerically demonstrate your estimator. Some
quantities you can show in your numerical analysis include
- For estimation: bias, variance
- For testing: Type I error control, power, coverage rate of
confidence intervals
- For computation: compute time, memory usage
For question 5, you must include R code that
replicates your numerical analysis. Also, I discourage you from using
existing R packages/code that implements the estimator, purely for your
learning sake. But, I recognize that not all of you have time and if you
are in this situation, please feel free to use existing code/software
and cite them. You should, however, generate your own code for the
simulation study.
You can read a paper on causal inference and answer the questions
above as part of reading/understanding the paper; this is highly
encouraged. Alternatively, you can study a question
discussed in class and answer the questions above as long as you
explain them in your own words. For example, if you decide to focus
on studying the ATE from an observational study where strong
ignorability holds, you should explain, in your own words, how to
identify the ATE and estimate the ATE with a doubly robust estimator.
Also, please cite relevant sources.
Finally, some problems in causal inference do not necessarily fit
into the above framework (e.g. sensitivity analysis). If you plan to
work in such areas, come talk to me and we can discuss alternative
questions that are appropriate for the causal problem you want to
address.
Grading
You can submit an R Markdown file, html file, or a PDF. You must
submit (i) a single file that answers the questions
above and (ii) relevant R code to Canvas (for PDF submissions).
You’ll be graded on (a) completeness and (b) accuracy. I view the
assignment as a means to help you learn about causal inference and as
such, it’s okay to have a couple of mistakes (e.g. some parts of the
proof are wrong; some assumptions are not stated correctly; code has
bugs). In general, I expect most of you to get an A. But, some reasons
for not getting an A include (but are not limited to):
- You didn’t do the assignment.
- You didn’t answer all of the questions above.
- The document is difficult to read (e.g. not organized, horrible
plots/tables, issues with spelling/grammar/style).
- You did the questions, but there are several, big mistakes.
- You violated the standards
for academic integrity.
FAQs (Last Update: Apr. 3, 2024)
- Can I discuss the assignment with other people in the class?
- Certainly! But, you must turn in your own work and follow the standards
for academic integrity.
- Can I use my ongoing research project as part of completing the
assignment?
- Yes.
- Can the assignment be longer than five pages?
- Yes. But, going over 10 pages will make it difficult for me to read
by the grading deadline (i.e. 10 days after the due date).
- Can I use Python instead of R? Can I use Jupyter Notebooks or Quarto
instead of R Markdown?
- Yes.
- Do you have some list of papers?
- I would start with the references in the lecture notes. If they do
not interest you, come talk to me.