Causal Inference: Identification Under Unmeasured Confounding

Author

Hyunseung Kang

Abstract

Most, if not all, observational studies do not satisfy ignorability because of the presence of unmeasured confounders. In this document, we discuss some popular assumptions to identify causal estimands when ignorability fails to hold. These new assumptions impose additional constraints about the heterogeneity of counterfactual outcomes as a function of unmeasured confounders and/or require new variables.

Review: Ignorability and Observational Studies

In the previous lecture, we identified various causal estimands under the following set of assumptions:

  • (A1, SUTVA): \(Y = A Y(1) + (1-A) Y(0)\)
  • (A2, Conditional randomization of \(A\)): \(A \perp Y(1), Y(0) | X\)
  • (A3, Positivity/Overlap): \(0 < \mathbb{P}(A=1 | X=x) < 1\) for all \(x\)

Assumptions (A2) and (A3) are referred to as strong ignorability. Let \(\mu_{a}(X) = \mathbb{E}[Y \mid A=a,X]\). Under (A1)-(A3), we showed that the ATE can be identified as \[\begin{align*} {\rm ATE} &= \mathbb{E}[Y(1) - Y(0)] \\ &= \mathbb{E}[\mathbb{E}[Y \mid A=1,X]] - \mathbb{E}[\mathbb{E}[Y \mid A=0,X]] \\ &= \mathbb{E}[\mu_1(X)] - \mathbb{E}[\mu_0(X)]. \end{align*}\]

While (A1)-(A3) are plausible in a stratified experiment, they are implausible in an observational study where the investigator no longer controls the treatment assignment and the measured covariates \(X\) are likely insufficient to satisfy ignorability.

When Does Strong Ignorability Fail

Some Examples

There are many ways for strong ignorability to fail. Here, we list some examples:

  • Lack of overlap: For some values of \(x\), it may be \(\mathbb{P}(A=1 | X=x) = 0\). If this occurs, then it’s usually appropriate to estimate the average treatment effect among a subset of \(X\), say \(\mathcal{X}\) where overlap holds.

It’s possible that overlap can hold in population, but there is an in-sample violation of overlap where \(\mathbb{P}(A=1 | X=x)\) is estimated to be close to (but not exactly equal to) \(0\) for some values of \(x\). This will affect estimation of the average treatment effect and we’ll discuss more in subsequent lectures.

  • Unmeasured confounding in an observational study: People select themselves into treatment (or control) based on measured covariates \(X\) and unmeasured covariates \(U\). More formally, strong ignorability holds with covariates \(X\) and \(U\): \[ {\rm (B2):}\ A \perp Y(1), Y(0) | X, U \quad \text{ and } \quad {\rm (B3):} \ 0 < \mathbb{P}(A=1 | X=x,U=u) < 1 \text{ for all } x, u\]
  • Imperfect/Quasi-randomized experiment: Consider a randomized experiment to study the causal effect of a new drug. Individuals are randomized to the new drug or the placebo. However, after randomization, individuals can choose to not take the new drug (or the placebo); this behavior is sometimes referred to as non-compliance in the causal inference literature because the individuals are not complying to the result of the initial randomization. If the scientific goal is to study the causal effect of taking the new drug versus taking the placebo, the act of taking the drug (or placebo) is not randomized. That is, if \(A\) denotes whether individual actually took the treatment, we can have \[A \not\perp Y(1), Y(0) | X\] However, if the scientific goal is to study the causal effect of being assigned to take the new drug versus the placebo, the treatment assignment is randomized and (A1), (A2), and (A3) holds with \(A\) defined as the treatment assignment.

In both cases, we may not have \(\mathbb{E}[Y(1) - Y(0)] \neq \mathbb{E}[\mu_1(X)] - \mathbb{E}[\mu_0(X)]\).

Consequences of Violating Strong Ignorability

Suppose (A2) and (A3) fails and instead, (B2) and (B3) hold. Let \(\mu_a(X,U) = \mathbb{E}[Y \mid A=a,X,U]\). Following the previous lectures, we can identify the ATE as \[\begin{align*} \mathbb{E}[Y(1) - Y(0)] &= \mathbb{E}[\mathbb{E}[Y|A=1,X,U]] - \mathbb{E}[\mathbb{E}[Y|A=0,X,U]] \\ &= \mathbb{E}[\mu_1(X,U)] - \mathbb{E}[\mu_0(X,U)] \\ &= \mathbb{E}[\mu_1(X,U) - \mu_1(X) + \mu_1(X)] - \mathbb{E}[\mu_0(X,U) - \mu_0(X) + \mu_0(X)] \\ &= \mathbb{E}[\mu_1(X,U) - \mu_1(X)] - \mathbb{E}[\mu_0(X,U) - \mu_0(X)] + \mathbb{E}[\mu_1(X) - \mu_0(X)] \end{align*}\] Or, rearranging the equality above, the causal bias of the ATE if it’s identified with assumptions (A1)-(A3) (i.e. via \(\mathbb{E}[\mu_1(X) - \mu_0(X)]\)) even though in reality, (A1), (B2) and (B3) are true, is \[ \underbrace{{\rm ATE} - \mathbb{E}[\mu_1(X) - \mu_0(X)]}_{\text{"Causal bias"}} = \mathbb{E}[ \underbrace{\{ \mu_1(X,U) - \mu_0(X,U) \}}_{\text{CATE of $X$ and $U$}} - \underbrace{\{\mu_1(X) - \mu_0(X)\}}_{\text{CATE of $X$}}] \] CATE stands for conditional average treatment effect that we discussed from the previous lecture.

There are some interesting implications of the “causal bias” formula above

  • If the CATE defined by \(X\) and \(U\) does not vary too much as a function of \(U\), \(\mu_1(X,U) - \mu_0(X,U) \approx \mu_1(X) - \mu_0(X)\) and we would have a small causal bias. In other words, if the treatment effect is not too heterogeneous with respect to \(U\), then we would have a small causal bias.
  • Suppose the difference between \(\mu_a(X,U)\) and \(\mu_a(X)\) is at most \(\Gamma \geq 0\), i.e. \(|\mu_a(X,U) - \mu_a(X)| \leq \Gamma\) for all \(X,U,a\). Then,we can get a lower bound and an upper bound of the ATE where the bounds are functions of the observed data (i.e. \(Y, A,X\)): \[ \mathbb{E}[\mu_1(X) -\mu_0(X)] - 2\Gamma \leq {\rm ATE} \leq \mathbb{E}[\mu_1(X) -\mu_0(X)] + 2\Gamma \]

Identification Without Strong Ignorability: Instrumental Variables (IVs)

Instrumental variables (IVs) are a popular approach to identify a causal estimand when ignorability does not hold; see Hernán and Robins (2006) and Baiocchi, Cheng, and Small (2014) for a review. Roughly speaking, an instrument relies on finding a variable \(Z\), called an instrument, where

  • \(Z\) is related to the treatment,
  • \(Z\) is independent from all unmeasured confounders that affect the outcome and the treatment, and
  • \(Z\) is related to the outcome via the treatment.

Here, we discuss two approaches to making the statements about \(Z\) precise.

Randomized Encouragement Designs

Motivation: Causal Effect of Smoking During Pregnancy (Sexton and Hebel (1984), Permutt and Hebel (1989))

Sexton and Hebel (1984) studied the causal effect of maternal smoking on birth weight. Because randomizing pregnant mothers to smoking (or non-smoking) is unethical, the authors considered an experimental design that randomized the encouragement to quit smoking. Specifically,

  1. Randomly assign some mothers to an encouragement intervention (i.e. \(Z=1\)) or the usual care (i.e. \(Z=0\)). The encouragement intervention encouraged mothers to not smoke through information, support, practical guidance, and usual care.
  2. Observe mothers’ smoking status where \(A=1\) denotes that the mother is not smoking during pregnancy and \(A=0\) denotess that the mother is smoking during pregnancy.
  3. Observe the birth weight of the newborn, denoted as \(Y\).

We refer to \(Z\) as the treatment assignment variable or the instrument. We refer to \(A\) as the treatment receipt variable. The dataset is summarized below.

This type of experimental design is referred to as a randomized encouragement design because the encouragement (or lack thereof) was randomized.

Defining Counterfactuals

To define causal effects in a randomized encouragement design, let \(A(z)\) denote the counterfactual treatment receipt under treatment variable \(z\) and \(Y(a,z)\) denote the counterfactual outcome under treatment variable \(z\) and treatment receipt \(a\). In the maternal smoking example above:

  • \(A(1)\): Counterfactual smoking status if the mother was encouraged to stop smoking
  • \(A(0)\): Counterfactual smoking status if the mother was not encouraged to stop smoking (i.e. the usual care)
  • \(Y(1,1)\): Counterfactual birthweight if the mother was encouraged to stop smoking and the mother stopped smoking
  • \(Y(1,0)\): Counterfactual birthweight if the mother was under the usual care and the mother stopped smoking
  • \(Y(0,1)\): Counterfactual birthweight if the mother was encouraged to stop smoking and the mother kept smoking
  • \(Y(0,0)\): Counterfactual birthweight if the mother was under the usual care and the mother kept smoking

Assumptions

We make the following assumptions

  • (IV1, SUTVA): \(A = ZA(1) + (1-Z)A(0)\) and \(Y=ZY(A(1),1) + (1-Z)Y(A(0),z)\)
  • (IV2, Ignorable instrument): \(Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0), A(1), A(0)\)
  • (IV3, Overlap/positivity on instrument): \(0 < P(Z=1) <1\)

Assumption (IV1) says we get to observe the counterfactuals that correspond to the observed value of the instrument \(Z\). Specifically, we only get to observe the counterfactual \(Y(a,z)\) that corresponds to the observed \(Z=z\) and \(A(Z) = A = a\). Also, note that there are no interference for the counterfactual outcome \(Y\) and the counterfactual treatment receipt \(A\).

Assumption (IV2) says that the instrument (i.e. \(Z\)) was completely randomized. This is the case in the randomized encouragement design above where the encouragement intervention (i.e. the instrument) was completely randomized.

Assumption (IV3) says that all values of the instrument have a non-zero probability of being realized. This is also the case in the randomized encouragement design above where some mothers were randomized to the encouragement intervention while other mothers were randomized to the usual care.

In short, (IV1)-(IV3) are conceptually identical to (A1)-(A3) where \(Z\) is replaced by \(A\).

We can also interpret assumptions (IV1), (IV2), and (IV3) using the data table that includes both counterfactuals \(Y(a,z), A(z)\) and observed variables \(Z,A,Y\):

\(Y(1,1)\) \(Y(1,0)\) \(Y(0,1)\) \(Y(0,0)\) \(A(1)\) \(A(0)\) \(A\) \(Z\) \(Y\)
Chloe 15 NA NA NA 1 NA 1 1 30
Sally NA NA 20 NA 0 NA 0 1 20
Kate NA NA NA 18 NA 0 0 0 18
Julie NA 25 NA NA NA 1 1 0 25

The variables \(Z\) and \(A\) both serve as missing indicators. But, we only make assumptions about the missingness indicator \(Z\) via (IV2) and (IV3); we don’t make any assumptions about the missingness indicator \(A\). In other words, assumptions (IV2) and (IV3) say that the missingness in the columns \(A(1)\) and \(A(0)\) are completely at random (MCAR) as the missingness in these columns are determined by \(Z\) only. This also means that identifying the causal effect of \(Z\) on \(A\) amounts to identifying the ATE in previous lectures where the counterfactual outcomes are replaced by \(A(1), A(0)\). But, the missingness in the columns of \(Y(\cdots)\) may not be entirely random; for these columns’ missingness to be MCAR, we have to assume \(A,Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0)\).

Assumptions (IV2) and (IV3) can have conditional counterparts where we condition on pre-instrument covariates \(X\), i.e. 

  • (IV2.c): \(Z \perp Y(1,0), Y(0,1), Y(0,0), A(1), A(0) \mid X\)
  • (IV3.c): \(0 < \mathbb{P}(Z=1 \mid X=x) <1\) for all \(x\)

The conditional versions of (IV2) and (IV3) would be plausible if the investigator conducted a stratified randomized encouragement design where randomization of \(Z\) was done within pre-defined blocks of individuals defined by \(X\); this is similar to a stratified randomized experiment from the previous lecture, except the randomization is done on \(Z\) instead of \(A\). Also, as long as the treatment assignment is well-defined (i.e., no versions of \(Z\) and no interference on \(Z\)), assumptions (IV1)-(IV3) will hold under a randomized encouragement design.

Under a randomized encouragement design, we can formalize the assumptions about the instrument mentioned above as follows:

  • (IV4, Non-zero causal effect): \(\mathbb{E}[A(1) - A(0)] \neq 0\)
  • (IV5, Exclusion restriction): \(Y(a,1) = Y(a,0) =Y(a)\) for all \(a\)
  • (IV6, Monotonicity/No Defiers): \(\mathbb{P}(A(1) \geq A(0))=1\)

Assumption (IV4) states that the instrument has a non-zero, average effect on the treatment receipt. In the maternal smoking example, (IV4) states that the encouragement intervention caused more mothers to quit smoking during pregnancy. Under (IV1)-(IV3), this assumption can be re-written based on the observed data, i.e. \(\mathbb{E}[A(1) -A(0)] = \mathbb{E}[A\mid Z=1] - \mathbb{E}[A \mid Z= 0] \neq 0\), and thus, can be assessed with the observed data.

Assumption (IV5) states that after fixing \(a\), the counterfactual outcomes are identical between \(z=1\) and \(z'=0\). In the maternal smoking example, (IV5) states that after fixing the mother’s smoking status, whether the mother was encouraged or not does not affect the birthweight of the newborn. Some remarks:

  • Unlike (IV4), this assumption cannot be written as a function of the observed data as it requires observing both \(Y(a,1)\) and \(Y(a,0)\). In other words, (IV5) cannot be directly assessed with the observed data, but testable implications exist (i.e. if (IV5) holds, the observed data must satisfy certain constraints); see page 1173 in Balke and Pearl (1997) and Theorem 1 of Wang, Robins, and Richardson (2017) for some examples when the instrument is binary.
  • (IV5) is the most controversial assumption in IV as the other assumptions (IV1)-(IV4) and (IV6) can be satisfied by a randomized encouragement design or be directly tested (e.g. (IV4)).
  • This assumption is referred to as exclusion restriction (Imbens and Angrist (1994), Angrist, Imbens, and Rubin (1996)).

Assumption (IV6) states that the instrument has a monotonic effect on the treatment receipt. See the next section for an alernative approach to interpreting this assumption.

Angrist, Imbens, and Rubin (1996) and Compliance Types

To interpret assumption (IV6), it’s useful to partition individuals based on their counterfactuals \(A(0),A(1)\) (Angrist, Imbens, and Rubin (1996),Frangakis and Rubin (2002)). Because each \(A(z)\) takes on two values, there are four possible subgroups of individuals based on the joint values of \(A(0), A(1)\)

\(A(0)\) \(A(1)\) Type
1 1 Always-Takers
0 1 Compliers
1 0 Defiers
0 0 Never-Takers

The names associated with each \(A(0), A(1)\) (e.g. always-takers, compliers) come from Table 1 of Angrist, Imbens, and Rubin (1996). In the maternal smoking example,

  • Always-takers are mothers who smoke irrespective of whether they were under the encouragement intervention or the usual care.
  • Compliers are mothers who do not smoke when they were under the encouragement intervention, but would smoke if they were under the usual care.
  • Never-takers are mothers who never smoke irrespective of whether they were under the encouragement intervention or the usual care.
  • Defiers are mothers who do not smoke when they are under the usual care, but smokes when they are under the encouragement intervention.

Assumption (IV6) rules out the existence of defiers in the study population, i.e. individuals who would not take the treatment if randomly assigned to it, but otherwise take the treatment if assigned to the control.

An important point from this table is that we cannot classify everyone in the study population as always-takers, compliers, and never-takers as this requires observing both \(A(1)\) and \(A(0)\). However, we can identify the column means of \(A(1)\) and \(A(0)\) from (IV1)-(IV3) via \(\mathbb{E}[A(z)] = \mathbb{E}[A \mid Z=z]\). Formally, for \(Z=1\) \[\begin{align*} \mathbb{E}[A \mid Z=1] &= \mathbb{E}[ZA(1) + (1-Z)A(0) \mid Z=1] && \text{(IV1)}\\ &=\mathbb{E}[A(1) \mid Z=1] \\ &= \mathbb{E}[A(1)] && \text{(IV2)} \end{align*}\] Note that (IV3) is needed to ensure that the conditional expectation that conditions on \(\{Z=1\}\) is well-defined. By a similar argument,we have \(\mathbb{E}[A \mid Z= 0]= \mathbb{E}[A(0)]\).

The above result implies that under (IV6) where defiers do not exist, we can identify the proportion of always-takers as \[\begin{align*} \mathbb{P}({\rm Always-takers}) &= \mathbb{P}(A(0) = 1) && \text{(IV6)} \\ &= \mathbb{E}[A \mid Z=0] && \text{(IV1)-(IV3)} \end{align*}\] By a similar argument, we can identify the proportion of never-takers as \[\begin{align*} \mathbb{P}({\rm Never-takers}) &= \mathbb{P}(A(1) = 0) && \text{(IV6)} \\ &= 1- \mathbb{P}(A(1) = 1) \\ &= 1 - \mathbb{E}[A \mid Z=1] && \text{(IV1)-(IV3)} \end{align*}\] Finally, we can identify the proportion of compliers as one minus the proportion of always-takers and never-takers: \[\begin{align*} &\mathbb{P}({\rm Compliers}) \\ =& 1 - \left( \mathbb{P}({\rm Always-takers}) + \mathbb{P}({\rm Never-takers}) \right) && \text{(IV6)} \\ =& \mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] && \text{See above} \end{align*}\]

In some experimental designs, we can enforce (IV6) by blocking access to treatment for all individuals who are randomized to the control \(Z=0\), i.e.,

  • (IV6.0 One-Sided Noncompliance): \(A(0) = 0\)

One-sided non-compliance is plausible in settings where \(Z\) represents a new program under evaluation and \(A\) represents the actual enrollment into the new program. In these settings, those who are not randomized into the new program (i.e \(Z=0\)) usually cannot enroll into the new program. In contrast, those who are randomized into the new program (i.e \(Z=1\)) can choose to enroll (i.e \(A=1\)) or not enroll (i.e. \(A=0\)) into the program. Note that (IV6.0) implies (IV6).

Causal Estimand: The Local Average Treatment Effect (LATE)

Under (IV1)-(IV6), we can identify the average treatment effect among the complier subpopulation. This quantity is sometimes referred to the local average treatment effect (LATE) (Imbens and Angrist (1994),Angrist, Imbens, and Rubin (1996)).

\[ {\rm LATE} = \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=0}_{\text{Compliers}}] \] In the maternal smoking example, \({\rm LATE}\) is the average effect of smoking during pregnancy among complier mothers (i.e. mothers who stop smoking if they were under the encouragement intervention, but smoke if they were under the usual care intervention).

  • From the discussion from the previous section, we cannot identify the different types of individuals (i.e. compliers, always-takers, and never-takers). In other words, LATE identifies the average treatment effect among a subgroup of individuals that are defined by latent classes. In contrast, the CATE identifies the average treatment effect among a subgroup of individuals that are defined by observed \(X\).
  • There is a healthy debate about whether LATE is a useful estimand or not (Hernán and Robins (2006),Deaton (2010),Imbens (2010), Imbens (2014),Baiocchi, Cheng, and Small (2014),Swanson and Hernán (2014)). I personally think the identification of the LATE provides one clear illustration about the difficulty of studying the average treatment effect when strong ignorability fails to hold.

The proof of identifying the LATE is as follows. First, we have \[\begin{align*} \mathbb{E}[Y \mid Z=1] &= \mathbb{E}[Z Y(A(1),1) + (1-Z) Y(A(0),0) \mid Z=1] && \text{(IV1)} \\ &= \mathbb{E}[Y(A(1),1)\mid Z= 1] && \\ &= \mathbb{E}[Y(1,1)A(1) + Y(0,1)(1-A(1)) \mid Z=1] \\ &= \mathbb{E}[Y(1,1)A(1) + Y(0,1)(1-A(1))] && \text{(IV2)} \\ &= \mathbb{E}[Y(1)A(1) + Y(0)(1-A(1))] && \text{(IV5)} \end{align*}\] Note that (IV3) is needed to ensure that the conditional expectation that conditions on \(\{Z=1\}\) is well-defined. By a similar argument, we have \(\mathbb{E}[Y \mid Z=0] = \mathbb{E}[Y(1)A(0) + Y(0)(1-A(0))]\).

We can also identify the causal effect of \(Z\) on the outcome \(Y\), which is often called the intent-to-treat effect \[ {\rm ITT} = \mathbb{E}[Y(A(1),1) - Y(A(0),0)] \] The ITT effect is also written as \(\matbb{E}[Y(1)-Y(0)]\) where the counterfactual outcome is re-defined so that \(Y(z) = Y(A(z),z)\). In words, the ITT effect measure the causal effect of the initial random assignment on the outcome where the initial random assign represents the investigator’s intent to assign treatment (or control) to the individual. In the maternal smoking example, the ITT effect is the effect of the encouragement intervention on the newborn’s birthweight; it does not directly measure the effect of maternal smoking on the newborn’s birthweight.

The identification of the ITT effect follows from assumptions (IV1)-(IV3), i.e.  \[\begin{align*} \mathbb{E}[Y \mid Z=1] &= \mathbb{E}[Z Y(A(1),1) + (1-Z) Y(A(0),0) \mid Z=1] && \text{(IV1)} \\ &= \mathbb{E}[Y(A(1),1)\mid Z= 1] &&\\ &=\mathbb{E}[Y(A(1),1)] && \text{(IV2)} \end{align*}\] Note that (IV3) is needed to ensure that the conditional expectation that conditions on \(\{Z=1\}\) is well-defined. By a similar argument, we have \(\mathbb{E}[Y \mid Z=0] = \mathbb{E}[Y(A(0),0)]\).

The ITT effect is often reported in many randomized experiments and IV studies where the instrument was randomized.

Second, we take the difference between the two expectations of \(\mathbb{E}[Y \mid Z=1]\) and \(\mathbb{E}[Y \mid Z= 0]\), we get \[\begin{align*} &\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0] \\ =& \mathbb{E}[ \{Y(1)A(1) + Y(0)(1-A(1))\} - \{Y(1)A(0) + Y(0)(1-A(0))\}] \\ =&\mathbb{E}[Y(1)\{A(1) - A(0)\} - Y(0)\{A(1) - A(0)\}] \\ =&\mathbb{E}[\{Y(1) - Y(0)\}\{A(1) - A(0)\}] \\ =& \mathbb{E}[\{Y(1) - Y(0)\} I(A(1) - A(0) = 1) + \{Y(1) - Y(0)\} I(A(1) - A(0) = -1) ] \\ =&\mathbb{E}[Y(1) - Y(0) | A(1) - A(0) = 1] \mathbb{P}(A(1) - A(0) = 1) && \text{(IV6)} \end{align*}\] The last equality also uses the definition of conditional expectation. We can also take the difference between \(\mathbb{E}[A \mid Z=1]\) and \(\mathbb{E}[A \mid Z= 0]\): \[\begin{align*} &\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] \\ =&\mathbb{E}[A(1) - A(0)] && \text{(IV1)-(IV3)} \\ =&\mathbb{P}(A(1) - A(0) = 1) && \text{(IV6)} \end{align*}\]

Finally, under (IV4), we can take the ratio of the two differences and the denominator of this ratio is non-zero and arrive at \[\begin{align*} &\frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \\ =& \frac{\mathbb{E}[ Y(1) - Y(0) | A(1) - A(0) = 1] \mathbb{P}(A(1) - A(0) = 1) }{\mathbb{P}(A(1) - A(0) = 1)} \\ =&\mathbb{E}[ Y(1) - Y(0) | A(1) - A(0) = 1] \\ =& {\rm LATE} \end{align*}\]

No-Additive Interaction Assumption

This type of instrument restricts the heterogeneity of \(U\)’s effect on the outcome or the treatment. The results we discuss here follows the description in Hernán and Robins (2006); the original idea is from Robins (1994). As you’ll see, the same ratio that identified the LATE identifies the ATT if the instruments are defined in a different way.

Roughly speaking, the no additive interaction framework does not necessarily assume the existence of the counterfactual \(A(z)\). Instead, I like to think of this framework as treating the instrument as a special, pre-treatment covariate \(Z\) that is endowed with the following properties.

  • (JV1, Causal consistency): \(Y = Y(A,Z)\)
  • (JV2, Exhcangeable instrument): \(Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0)\)
  • (JV3, Positivity): \(0 < \mathbb{P}(Z=1) <1\)
  • (JV4, Instrument relevance) \(Z \not\perp A\)
  • (JV5, Exclusion restriction) \(Y(a,1)=Y(a,0)=Y(a)\) for all \(a\)
  • (JV6, No additive interaction) Suppose (JV5) holds. We have \(\mathbb{E}[Y(1) - Y(0) | Z=1, A=1] = \mathbb{E}[Y(1) - Y(0) | Z=0, A=1]\)

Assumption (JV1) and (JV2) are similar to assumptions (IV1) and (IV2), except that assumptions about the counterfactual \(A(z)\) are no longer present. Assumption (JV3) and (IV3) are identical. Also, similar to assumptions (IV2.c) and (IV3.c), we can create conditional versions of (JV2) and (JV3), i.e.:

  • (JV2.c) \(Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0) \mid X\)
  • (JV3.c) \(0 < \mathbb{P}(Z=1 \mid X=x) <1\) for all \(x\)

Assumption (JV4) states that the instrument is associated with \(A\). in contrast to assumption (IV4), we do not necessarily need to have a causal effect of \(Z\) on \(A\). Assumption (JV5) is identical to (IV5).

Assumption (JV6) can be interpreted by writing out a saturated model of the conditional expectation in (JV6). \[\mathbb{E}[Y(1) -Y(0) \mid Z=z,A=1] = \beta_{0} + \beta_{1}z\] A saturated model simply means that all of the variations on the left-hand side of the equality (i.e. the conditional expectation) can be explained by the model on the right-hand side of the equality. The term \(\beta_0\) represents the ATT among individuals with \(Z=0\) and the term \(\beta_0 + \beta_1\) represents the ATT among individuals with \(Z=1\). Then, assumption (JV6) can be rewritten as \[\mathbb{E}[Y(1) -Y(0) \mid Z=1,A=1] - \mathbb{E}[Y(1) -Y(0) \mid Z=0,A=1] = \beta_1 = 0\] In other words, the no additive interaction effect says that the ATT effect is the same among individuals with \(Z =0\) and \(Z=1\).

Now, we are ready to show that the ratio we discussed earlier identifies the ATT, i.e.  \[ {\rm ATT} = \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \] We begin with the numerator of this ratio. \[\begin{align*} \mathbb{E}[Y \mid Z=z] &= \mathbb{E}[Y(A,Z) \mid Z =z] && \text{(JV1)} \\ &= \mathbb{E}[Y(A) \mid Z=z] && \text{(JV5)} \\ &= \mathbb{E}[Y(1) \mid Z=z,A=1]\mathbb{P}(A=1\mid Z=z) + \mathbb{E}[Y(0) \mid Z=z,A=0]\mathbb{P}(A=0 \mid Z=z) \\ &= \mathbb{E}[Y(1) - Y(0) \mid Z=z,A=1]\mathbb{P}(A=1\mid Z=z) + \mathbb{E}[Y(0) \mid Z=z,A=1]\mathbb{P}(A=1 \mid Z=z) + \mathbb{E}[Y(0) \mid Z=z,A=0]\mathbb{P}(A=0 \mid Z=z) \\ &= \mathbb{E}[Y(1) - Y(0) \mid Z=z,A=1]\mathbb{P}(A=1\mid Z=z) + \mathbb{E}[Y(0) \mid Z=z] && \text{Law of total expectation} \\ &= \mathbb{E}[Y(1) - Y(0) \mid Z=z,A=1]\mathbb{P}(A=1\mid Z=z) + \mathbb{E}[Y(0)] && \text{(JV2)} \end{align*}\] Note that assumption (JV3) is used to have a well-defined conditional event \({\mid Z=z}\). Taking the difference \(\mathbb{E}[Y \mid Z=1]- \mathbb{E}[Y \mid Z=0]\) yields \[\begin{align*} &\mathbb{E}[Y \mid Z=1]- \mathbb{E}[Y \mid Z=0] \\ =& \mathbb{E}[Y(1) - Y(0) \mid Z=1,A=1]\mathbb{P}(A=1\mid Z=1) - \mathbb{E}[Y(1) - Y(0) \mid Z=0,A=1]\mathbb{P}(A=1\mid Z=0) \\ =& \mathbb{E}[Y(1) - Y(0) \mid Z=0,A=1]\left(\mathbb{P}(A=1 \mid Z=1) - \mathbb{P}(A=1 \mid Z=0)\right) && \text{(JV6)} \end{align*}\] Dividing this by the \(\mathbb{P}(A=1 \mid Z=1) - \mathbb{P}(A=1 \mid Z=0)\), which must be non-zero by assumption (JV4) and realizing that \(\mathbb{E}[Y(1) -Y(0) \mid Z=0,A=1] =\mathbb{E}[Y(1) -Y(0) \mid A=1]\) from (JV6) gives us the desired result \end{align*}

This identifies the ATT \[ {\rm ATT} = \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \] Recent works have relaxed (JV6) to allow identification of the ATT (or the ATE); see Wang and Tchetgen Tchetgen (2018) and Cui and Tchetgen Tchetgen (2021).

Identification Without Strong Ignorability: Regression Discontinuity Designs (RDDs)

Motivation

Assumptions for Sharp RDDs

Causal Estimand: ATE at the Cutoff

Assumptions for Fuzzy RDDs

Causal Estimand: LATE at the Cutoff

Angrist, Joshua D, Guido W Imbens, and Donald B Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444–55.
Baiocchi, Michael, Jing Cheng, and Dylan S Small. 2014. “Instrumental Variable Methods for Causal Inference.” Statistics in Medicine 33 (13): 2297–2340.
Balke, Alexander, and Judea Pearl. 1997. “Bounds on Treatment Effects from Studies with Imperfect Compliance.” Journal of the American Statistical Association 92 (439): 1171–76.
Cui, Yifan, and Eric Tchetgen Tchetgen. 2021. “A Semiparametric Instrumental Variable Approach to Optimal Treatment Regimes Under Endogeneity.” Journal of the American Statistical Association 116 (533): 162–73.
Deaton, Angus. 2010. “Instruments, Randomization, and Learning about Development.” Journal of Economic Literature 48 (2): 424–55.
Frangakis, Constantine E, and Donald B Rubin. 2002. “Principal Stratification in Causal Inference.” Biometrics 58 (1): 21–29.
Hernán, Miguel A, and James M Robins. 2006. “Instruments for Causal Inference: An Epidemiologist’s Dream?” Epidemiology 17 (4): 360–72.
Imbens, Guido W. 2010. “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009).” Journal of Economic Literature 48 (2): 399–423.
———. 2014. “Instrumental Variables: An Econometrician’s Perspective.” Statistical Science 29 (3): 323–58.
Imbens, Guido W, and Joshua D Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62 (2): 467–75.
Permutt, Thomas, and J Richard Hebel. 1989. “Simultaneous-Equation Estimation in a Clinical Trial of the Effect of Smoking on Birth Weight.” Biometrics, 619–22.
Robins, James M. 1994. “Correcting for Non-Compliance in Randomized Trials Using Structural Nested Mean Models.” Communications in Statistics-Theory and Methods 23 (8): 2379–2412.
Sexton, Mary, and J Richard Hebel. 1984. “A Clinical Trial of Change in Maternal Smoking and Its Effect on Birth Weight.” Jama 251 (7): 911–15.
Swanson, Sonja A, and Miguel A Hernán. 2014. “Think Globally, Act Globally: An Epidemiologist’s Perspective on Instrumental Variable Estimation.” Statistical Science 29 (3): 371–74.
Wang, Linbo, James M Robins, and Thomas S Richardson. 2017. “On Falsification of the Binary Instrumental Variable Model.” Biometrika 104 (1): 229–36.
Wang, Linbo, and Eric Tchetgen Tchetgen. 2018. “Bounded, Efficient and Multiply Robust Estimation of Average Treatment Effects Using Instrumental Variables.” Journal of the Royal Statistical Society Series B: Statistical Methodology 80 (3): 531–50.