Causal Inference: Identification Under Unmeasured Confounding (Instrumental Variables)

Author

Hyunseung Kang

Published

October 9, 2025

Abstract

Most, if not all, observational studies do not satisfy ignorability/conditional exchangeability because of unmeasured confounders, pre-treamtent variables that affect both the treatment and the outcome. In this lecture, we discuss one of the most popular approaches to identify causal effects when ignorability fails: instrumental variables (IVs). Briefly, IVs require finding a variable \(Z\), called an instrument, that satisfy certain assumptions. We cover two ways to formalize these assumptions, one based on monotonicity and another based on no additive interactions.

Concepts Covered Today

  • Identification with an instrument
    • Monotonicity-based approach and the local/complier average treatment effect (LATE)
    • No additive interaction approach
  • Randomized encouragement designs
  • References:
    • Chapter 16 of M. Hernán and Robins (2020)
    • For monotonicity-based approach: Baiocchi, Cheng, and Small (2014)
    • For no additive interaction approach: Wang and Tchetgen Tchetgen (2018)

Review: Strong Ignorability and Observational Studies

We identified various causal estimands under the following assumptions:

  • (A1, SUTVA): \(Y = A Y(1) + (1-A) Y(0)\)
  • (A2, Conditional randomization of \(A\)): \(A \perp Y(1), Y(0) | X\)
  • (A3, Positivity/Overlap): \(0 < \mathbb{P}(A=1 | X=x) < 1\) for all \(x\)

Assumptions (A2) and (A3) are referred to as strong ignorability Rosenbaum and Rubin (1983).

Under (A1)-(A3), we showed that the ATE can be identified as \[{\rm ATE} = \mathbb{E}[Y(1) - Y(0)] = \mathbb{E}[\mathbb{E}[Y \mid A=1,X]] - \mathbb{E}[\mathbb{E}[Y \mid A=0,X]]\]

(A1)-(A3) are plausible in a stratified randomized experiment where treatment is randomized within strata defined by \(X\). But, in an observational study, the assumptions, especially (A2), is often implausible.

This lecture will focus on the failure of (A2).

When Does (A2) Fail? Non-compliance in Randomized Experiments

One way for (A2) to fail is due to non-compliance in a randomized experiment.

Consider a randomized experiment to study the causal effect of a new therapy program versus the “standard” program.

  1. Participants are randomized to the new therapy program (i.e., \(A = 1\)) or the standard program (i.e., \(A=0\)).
  2. But after randomization, some participants either
    • Drop out of the new program for the standard program or
    • Opt into the new program from the standard program

This is referred to as non-compliance because participants are not “complying” to the initial randomization of treatment.

Suppose the goal is to study the causal effect of using the new program versus not using it.

  • Participant’s usage/adherence is not randomized. Formally,
    • Let \(D \in \{0,1\}\) denote the treatment receipt of an individual
      • \(D = 1\): individual uses the new therapy program.
      • \(D = 0\): individual uses the standard program.
    • Let \(Y(d), d\in \{0,1\}\) denote the counterfactual outcome of the new therapy (i.e.., \(Y(1)\)) or the standard program (i.e, \(Y(0)\)).
    • We have \(D \not\perp Y(1), Y(0),X\)

If, however, participant’s usage of the program \(D\) is as-if random after adjusting for measured \(X\), \(D\) will satisfy (A2), i.e., \(D \perp Y(1), Y(0) \mid X\), and we can use previous lectures to identify the causal effect of using the program.

  • For example, if using the new therapy program is effectively random after adjusting for patient’s age and gender, then \(D\) will satisfy (A2).
  • In most cases, investigators rarely believe that \(D\) satisfies (A2) with \(X\).

When Does (A2) Fail? Unmeasured Confounders in Observational Studies

Another way for (A2) to fail is due to a presence of unmeasured confounders \(U\) in an observational study.

  • People select themselves into treatment (or control) based on measured covariates \(X\) and unmeasured covariates \(U\).
  • More formally, strong ignorability holds with \(X\) and \(U\): \[ A \perp Y(1), Y(0) | X, U \quad \text{ and } 0 < \mathbb{P}(A=1 | X=x,U=u) < 1 \text{ for all } x, u\]

In both examples, we no longer have the identification result:

\[\mathbb{E}[Y(1) - Y(0)] \neq \mathbb{E}[\mathbb{E}[Y \mid A=1,X]] - \mathbb{E}[\mathbb{E}[Y \mid A=0,X]]\]

Suppose (A2) and (A3) fail with \(X\), but hold after conditioning on \(X\) and \(U\); see (A2’) and (A3’) below.

  • (A2’) \(A \perp Y(1), Y(0) | X, U\)
  • (A3’) \(0 < \mathbb{P}(A=1 | X=x,U=u) < 1 \text{ for all } x, u\)

Let \(\mu_a(X) = \mathbb{E}[Y \mid A=a,X]\), which we can identify with the observed data. We’ll quantify the ``identification bias’’ of identifying the ATE when we use \(\mathbb{E}[\mu_1(X)] - \mathbb{E}[\mu_0(X)]\) to identify the ATE \(\mathbb{E}[Y(1) -Y(0)]\):

\[\text{Identification Bias} = \mathbb{E}[Y(1) - Y(0)] - \mathbb{E}[\mu_1(X) - \mu_0(X)]\] Note that from previous lectures, the identification bias is zero if (A2) and (A3) held with \(X\). But, if (A2) and (A3) hold, this bias essentially depends on \(U\)’s effect on \(Y\).

Let \(\mu_a(X,U) = \mathbb{E}[Y \mid A=a,X,U]\). Following the previous lectures, we can identify the ATE as \[\begin{align*} \mathbb{E}[Y(1) - Y(0)] &= \mathbb{E}[\mathbb{E}[Y|A=1,X,U]] - \mathbb{E}[\mathbb{E}[Y|A=0,X,U]] \\ &= \mathbb{E}[\mu_1(X,U)] - \mathbb{E}[\mu_0(X,U)] \\ &= \mathbb{E}[\mu_1(X,U) - \mu_1(X) + \mu_1(X)] - \mathbb{E}[\mu_0(X,U) - \mu_0(X) + \mu_0(X)] \\ &= \mathbb{E}[\mu_1(X,U) - \mu_1(X)] - \mathbb{E}[\mu_0(X,U) - \mu_0(X)] + \mathbb{E}[\mu_1(X) - \mu_0(X)] \end{align*}\] Or, rearranging the equality above, the bias is \[ \text{Identification Bias} = \mathbb{E}[ \underbrace{\{ \mu_1(X,U) - \mu_0(X,U) \}}_{\text{CATE of $X$ and $U$}} - \underbrace{\{\mu_1(X) - \mu_0(X)\}}_{\text{CATE of $X$}}] \] CATE stands for conditional average treatment effect that we discussed from previous lectures.

There are some interesting implications of the bias formula above

  • If the CATE defined by \(X\) and \(U\) does not vary too much as a function of \(U\), \(\mu_1(X,U) - \mu_0(X,U) \approx \mu_1(X) - \mu_0(X)\) and we would have a small causal bias. In other words, even after conditioning on \(X\), if the treatment effect remains heterogeneous with respect to the unmeasured confounder \(U\), then we would have small identification bias.
  • Suppose the difference between \(\mu_a(X,U)\) and \(\mu_a(X)\) is at most \(\Gamma \geq 0\), i.e. \(|\mu_a(X,U) - \mu_a(X)| \leq \Gamma\) for all \(X,U,a\). Then,we can get a lower bound and an upper bound of the ATE where the bounds are functions of the observed data (i.e. \(Y, A,X\)): \[ \mathbb{E}[\mu_1(X) -\mu_0(X)] - 2\Gamma \leq {\rm ATE} \leq \mathbb{E}[\mu_1(X) -\mu_0(X)] + 2\Gamma \]

Identification Without (A2): Instrumental Variables (IVs)

Instrumental variables (IVs) are a popular approach to identify a causal effect when (A2) does not hold; see M. A. Hernán and Robins (2006) and Baiocchi, Cheng, and Small (2014) for a review. Roughly speaking, an instrument relies on finding a variable \(Z\), called an instrument, where

  • \(Z\) is related to the treatment \(A\),
  • \(Z\) is independent from all unmeasured confounders that affect the outcome \(Y\) and the treatment \(A\), and
  • \(Z\) is only related to the outcome \(Y\) via the treatment \(A\).

Here, we discuss two approaches to making the above statements about the instrument \(Z\) precise.

  1. Monotonicity-based approach
  2. No additive interaction approach

Motivation for Monotonicity-Based Approach: Randomized Encouragement Designs

A classic way to motivate the monotonicity-based approach to defining an instrument and its assumptions is through a randomized encouragement design.

Sexton and Hebel (1984) studied the causal effect of maternal smoking on birth weight. Because randomizing pregnant mothers to smoking is unethical, the authors considered an experimental design that randomized the encouragement to quit smoking.

  1. Randomly assign some mothers to the encouragement intervention (i.e. \(Z=1\)) or the usual care (i.e. \(Z=0\)). The encouragement intervention further informed mothers about the dangers of smoking through additional information, support, and practical guidance.
  2. Observe mothers’ smoking status where \(A=1\) denotes that the mother is not smoking during pregnancy and \(A=0\) denotes that the mother is smoking during pregnancy.
  3. Observe the birth weight of the newborn, denoted as \(Y\).

We refer to \(Z\) as the treatment assignment variable or, as we’ll see later, the instrument in this setup. We refer to \(A\) as the treatment receipt variable. Also, this type of experimental design is referred to as a randomized encouragement design because the encouragement (or lack thereof; \(Z\)) was randomized. But, the treatment receipt \(A\) is not randomized.

  • If the encouragement is 100% successful so that \(Z = A\), we have effectively randomized \(A\) via \(Z\). In practice, this is rarely the case.
  • Nevertheless, the randomization of \(Z\) induces some randomization of \(A\), which we can exploit to obtain some causal effect of \(A\).

Randomized Encouragement Designs: Counterfactuals

To define causal effects in a randomized encouragement design, we define the following counterfactual outcomes

  • \(A(z)\): the counterfactual treatment receipt under instrument \(z\)
  • \(Y(a,z)\): the counterfactual outcome under instrument \(z\) and treatment receipt \(a\).

In the maternal smoking example:

  • \(A(1)\): counterfactual smoking status if the mother was encouraged to stop smoking (i.e., \(z= 1\))
  • \(A(0)\): counterfactual smoking status if the mother was not encouraged to stop smoking (i.e. \(z=0\))
  • \(Y(1,1)\): counterfactual birth weight of the newborn if the mother was encouraged to stop smoking (i.e., \(z=1\)) and the mother stopped smoking (i.e., \(a=1\))
  • \(Y(1,0)\): counterfactual birth weight of the newborn if the mother was under the usual care (i.e., \(z=0\)) and the mother stopped smoking (i.e., \(a=1\))
  • \(Y(0,1)\): counterfactual birth weight of the newborn if the mother was encouraged to stop smoking (i.e., \(z=1\)) and the mother kept smoking (i.e., \(a=0\))
  • \(Y(0,0)\): counterfactual birth weight of the newborn if the mother was under the usual care (i.e., \(z=0\)) and the mother kept smoking (i.e., \(a=0\))

It’s also useful to study the following counterfactuals derived from above:

  • \(Y(A(z),z)\): the counterfactual outcome under instrument \(z\) and treatment receipt if it takes on the value \(A(z)\)
    • Given \(z\), the counterfactual outcome is \(Y(A(z),z)\).
    • This is in contrast to \(Y(a,z)\) where we need to specify both \(a\) and \(z\).
  • When used in the context of defining \(Y(A(z),z)\), \(A(z)\) is sometimes referred to as the “natural value” of \(A\).

In the maternal smoking example:

  • \(Y(A(1),1)\): counterfactual birth weight of the newborn if the mother was encouraged to stop smoking (i.e., \(z=1\)) and the mother’s smoking status was set to her counterfactual smoking status under encouragement \(A(1)\).
  • \(Y(A(0),0)\): counterfactual birth weight of the newborn if the mother was not encouraged to stop smoking (i.e., \(z=0\)) and the mother’s smoking status was set to her counterfactual smoking status under no encouragement \(A(0)\).

Randomized Encouragement Designs: Assumptions

The following assumptions are implied from a randomized encouragement design.

  • (IV1, SUTVA of \(Z\)): \(A = ZA(1) + (1-Z)A(0)\) and \(Y=ZY(A(1),1) + (1-Z)Y(A(0),0)\)
  • (IV2, Ignorable instrument): \(Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0), A(1), A(0)\)
  • (IV3, Overlap/positivity on instrument): \(0 < P(Z=1) <1\)

Assumption (IV1) says we get to observe the counterfactuals that correspond to the observed value of the instrument \(Z\).

  • For the outcome, we only get to observe the counterfactual outcome \(Y(a,z)\) that corresponds to the observed instrument \(Z=z\), specifically \(Y(A(z),z)\).
  • This is the case in the randomized encouragement design above about smoking where the researcher only has two interventions: the encouragement to quit smoking or the usual care.
  • Note that SUTVA also implies two “mini” assumptions about no multiple versions of treatment and no interference.

Assumption (IV2) says that the instrument (i.e. \(Z\)) was completely randomized.

  • This is the case in the randomized encouragement design about smoking where the encouragement intervention (i.e., \(Z\)) was completely randomized.

Assumption (IV3) says that all values of the instrument have a non-zero probability of being realized.

  • This is also the case in the randomized encouragement design above where some mothers were randomized to the encouragement intervention while other mothers were randomized to the usual care.

In short, (IV1)-(IV3) are conceptually identical to (A1)-(A3) where \(Z\) is replaced by \(A\).

Similar to previous lectures where we generalized a completely randomized experiment into a stratified randomized experiment based on covariates \(X\), we can generalize a randomized encouragement design into a stratified randomized experiment design.

For example, consider again the smoking and birth weight example above.

  • Instead of completely randomizing who gets the encouragement intervention or the usual care, we randomize the encouragement intervention within pre-defined blocks of mothers.
  • Each block is defined by mothers’ measurable characteristics (e.g., age)
  • Within each block, some mothers get randomized to the encouragement intervention while others get the usual care. Note that the probability of getting the encouragement can differ across blocks.
    • Among mothers who are older than 40, the probability of getting the encouragement intervention is 90%
    • Among mothers who are between 25 to 30 years old, the probability of getting the encouragement intervention is 80%

Formally, we can rewrite (IV2) and (IV3) as follows:

  • (IV2): \(Z \perp Y(1,0), Y(0,1), Y(0,0), A(1), A(0) \mid X\)
  • (IV3): \(0 < \mathbb{P}(Z=1 \mid X=x) <1\) for all \(x\)

Notice that this is nearly identical to a stratified randomized experiment from the previous lecture, except the randomization is done on \(Z\) instead of \(A\).

Encouragement Effect

If our goal is to simply identify the causal effect of \(Z\) (i.e., the causal effect of encouragement versus usual care), we can use the prior lectures do this.

For example, suppose we are interested in the effect of the encouragement (i.e., \(Z\)) on mother’s smoking status (i.e., \(A\)), say \(\mathbb{E}[A(1) - A(0)]\).

  • From the previous lectures, under (IV1)-(IV3), we can identify the causal effect \(\mathbb{E}[A(1) - A(0)] = \mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0]\)
  • As we show later, in many IV settings, the magnitude of \(\mathbb{E}[A(1) - A(0)]\), loosely referred to as instrument strength, affects the estimation and inference of the treatment effect of \(A\) on \(Y\).

Intent-to-Treat (ITT) Effect

We can also identify the causal effect of \(Z\) on the outcome \(Y\), which is often called the intent-to-treat (ITT) effect.

\[ {\rm ITT} = \mathbb{E}[Y(A(1),1) - Y(A(0),0)] \]

The ITT effect is also written as \(\mathbb{E}[Y(1)-Y(0)]\) where the counterfactual outcome is re-defined so that \(Y(z) = Y(A(z),z)\).

In words, the ITT effect measure the causal effect of the initial random assignment (or the instrument \(Z\)) on the outcome

  • The initial random assign represents the investigator’s intent to assign treatment (or control) to participants in an experiment.
  • In the maternal smoking example, the ITT effect is the causal effect of the encouragement intervention on the newborn’s birth weight
  • The ITT effect does not directly measure the causal effect of maternal smoking on the newborn’s birth weight.
  • The ITT effect is often reported in many randomized experiments and IV studies.

The identification of the ITT effect follows from assumptions (IV1)-(IV3), i.e., \[\begin{align*} \mathbb{E}[Y \mid Z=1] &= \mathbb{E}[Z Y(A(1),1) + (1-Z) Y(A(0),0) \mid Z=1] && \text{(IV1)} \\ &= \mathbb{E}[Y(A(1),1)\mid Z= 1] &&\\ &=\mathbb{E}[Y(A(1),1)] && \text{(IV2)} \end{align*}\] By a similar argument, we have \(\mathbb{E}[Y \mid Z=0] = \mathbb{E}[Y(A(0),0)]\). Combined, we have

\[\mathbb{E}[Y(A(1),1)] - \mathbb{E}[Y(A(0),0)] = \mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]\]

Randomized Encouragement Designs and Connection to Missing Data

We can also interpret assumptions (IV1), (IV2), and (IV3) using the data table that includes both counterfactuals \(Y(a,z), A(z)\) and observed variables \(Z,A,Y\):

\(Y(1,1)\) \(Y(1,0)\) \(Y(0,1)\) \(Y(0,0)\) \(A(1)\) \(A(0)\) \(A\) \(Z\) \(Y\)
Chloe 15 NA NA NA 1 NA 1 1 15
Sally NA NA 20 NA 0 NA 0 1 20
Kate NA NA NA 18 NA 0 0 0 18
Julie NA 25 NA NA NA 1 1 0 25

The variables \(Z\) and \(A\) both serve as missingness indicators. But, we only make assumptions about the missingness indicator \(Z\) via (IV2) and (IV3); we don’t make any assumptions about the missingness indicator \(A\).

  • In other words, assumptions (IV2) and (IV3) say that the missingness in the columns \(A(1)\) and \(A(0)\) are completely at random (MCAR) as the missingness in these columns are completely determined by \(Z\), which is random by (IV2).
  • But, the missingness of the four \(Y(\cdot)\) columns may not be MCAR because (IV2) and (IV3) do not imply MCAR for \(A\).
    • For these columns, the missingness is determined by \(Z\) and \(A\).
    • For these columns’ missingness to be MCAR, we need \(A,Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0)\).

Because the entries of the columns of \(A(\cdot)\) are MCAR, we can identify the column means of \(A(\cdot)\) by simply taking the mean of the observed entries of \(A(\cdot)\), i.e.,

\[\mathbb{E}[A(1)] = \mathbb{E}[A \mid Z= 1]\]

In contrast, we cannot directly identify all four column means of \(Y(\cdot)\) with the means of the observed entries as the missingness in these columns are not MCAR.

  • But from the ITT slides above, we can identify a particular “mixture” of columns of \(Y(\cdot)\) so long as the missingness in this “mixture column” is MCAR (via \(Z\)).
\(A(1)\) \(A(0)\) \(Y(A(1),1)\) \(Y(A(0),0)\) \(Z\) \(Y\)
Chloe 1 NA 15 NA 1 15
Sally 0 NA 20 NA 1 20
Kate NA 0 NA 18 0 18
Julie NA 1 NA 25 0 25

The two new columns \(Y(A(1),1)\) and \(Y(A(0),0)\) essentially combine the four columns \(Y(1,1)\), \(Y(1,0)\), \(Y(0,1)\), and \(Y(0,0)\) so that the missingness in the two new columns only depend on \(Z\)

The column \(Y(A(1),1)\) “fuses” the columns \(Y(1,1)\) and \(Y(0,1)\) (i.e., \(z=1\))

  • In words, this column represents a mixture of two sub-population of mothers under the encouragement intervention:
    1. mothers who decided to stop smoking after the encouragement (i.e., \(A(1) =1\))
    2. mothers who continued smoking after the encouragement (i.e., \(A(1) =0\))
  • The average of this column represents the birth weight of infants from two sub-population of mothers.

The column \(Y(A(0),0)\) “fuses” the columns \(Y(1,0)\) and \(Y(0,0)\) (i.e., \(z=0\))

  • In words, this column represents a mixture of two sub-population of mothers under the usual care:
    1. mothers who decided to stop smoking after the usual care (i.e., \(A(0) =1\))
    2. mothers who continued smoking after the usual care (i.e., \(A(0) =0\))
  • The average of this column represents the birth weight of infants from two sub-population of mothers.

As mentioned earlier, for the two columns \(Y(A(1),1)\) and \(Y(A(0),0)\), their missingness pattern is MCAR as the missingness only depends on \(Z\).

  • Thus, we can identify the column mean of \(Y(A(z),z)\) for \(z \in \{0,1\}\) by simply taking the mean of the observed values, i.e., \(\mathbb{E}[Y(A(z),z)] = \mathbb{E}[Y \mid Z=z]\)
  • This matches the identification result for the intent-to-treat effect.

Monotonicity-Based IV Assumptions

Under a randomized encouragement design, we can formalize the assumptions about the instrument \(Z\). This is broadly referred to as “monotonicity-based” IV assumptions.

  • (IV4, Instrument relevance): \(\mathbb{E}[A(1) - A(0)] \neq 0\)
  • (IV5, Exclusion restriction): \(Y(a,1) = Y(a,0) =Y(a)\) for all \(a\)
  • (IV6, Monotonicity/No Defiers): \(\mathbb{P}(A(1) - A(0) \geq 0)=1\)

Assumption (IV4) states that the instrument has a non-zero, causal effect on the treatment receipt.

  • In the maternal smoking example, (IV4) states that the encouragement intervention caused more mothers to quit smoking during pregnancy.
  • Under (IV1)-(IV3), this assumption can be re-written based on the observed data, i.e. \(\mathbb{E}[A(1) -A(0)] = \mathbb{E}[A\mid Z=1] - \mathbb{E}[A \mid Z= 0] \neq 0\).
  • This means that we can directly test (IV4) with the observed data by testing whether \(\mathbb{E}[A\mid Z=1] - \mathbb{E}[A \mid Z= 0]\) is zero or not.

Assumption (IV5) states that the counterfactual outcomes are identical between \(z=1\) and \(z=0\) once the treatment receipt status \(a\) is fixed.

  • In the maternal smoking example, (IV5) states that after fixing the mother’s smoking status, whether the mother was encouraged or not does not affect the birth weight of the newborn.
  • Unlike (IV4), (IV5) cannot be written as a function of the observed data as it requires observing both \(Y(a,1)\) and \(Y(a,0)\). From (IV1), this is not possible.
    • In other words, (IV5) cannot be directly tested with the observed data.
    • But, testable implications exist, i.e., if (IV5) holds, the observed data must satisfy certain constraints. See page 1173 in Balke and Pearl (1997) and Theorem 1 of Wang, Robins, and Richardson (2017) for some examples when the instrument is binary.
  • (IV5) is the most controversial assumption as the other assumptions (IV1)-(IV4) and (IV6) can be plausibly satisfied by the experimental design (e.g., (IV1)-(IV3), (IV6)) or be directly tested with the observed data (e.g., (IV4)).
  • This assumption is referred to as the exclusion restriction (Imbens and Angrist (1994), Angrist, Imbens, and Rubin (1996)).

Assumption (IV6) states that the instrument has a non-negative, causal effect on the treatment receipt for everyone.

If pre-instrument covariates \(X\) are present and are necessary to justify (IV1)-(IV3), for instance the investigator ran a stratified randomized encouragement design, it’s likely the case that we need to modify (IV4)-(IV6) to incorporate \(X\). Otherwise, we may not be able to identify the LATE described below.

There are different ways to reframe (IV4)-(IV6) to incorporate \(X\). One popular approach formalized by Abadie (2003) is:

  • (IV4, Instrument relevance): \(\mathbb{E}[A(1) - A(0) \mid X=x] \neq 0\) for all \(x\).
  • (IV5, Exclusion restriction): \(\mathbb{P}(Y(a,1) = Y(a,0) \mid X=x)=1\) for all \(a\) and \(x\).
  • (IV6, Monotonicity/No Defiers): \(\mathbb{P}(A(1) - A(0) \geq 0 \mid X=x)=1\) for all \(x\).

There are slightly weaker version of (IV5) where we replace \(\mathbb{P}\) with an expectation \(\mathbb{E}\). But, the practical difference of this is minimal.

Compliance Types (Angrist, Imbens, and Rubin (1996))

To interpret assumption (IV6), it’s useful to partition individuals based on their counterfactuals \(A(0),A(1)\). Because each \(A(z)\) takes on two values, there are four possible subgroups of individuals based on the joint values of \(A(0), A(1)\):

\(A(0)\) \(A(1)\) Type
1 1 Always-Takers
0 1 Compliers
1 0 Defiers
0 0 Never-Takers

The names associated with each \(A(0), A(1)\) (e.g. always-takers, compliers) come from Table 1 of Angrist, Imbens, and Rubin (1996). In the maternal smoking example,

  • Always-takers are mothers who never smoke irrespective of whether they were under the encouragement intervention or the usual care.
  • Compliers are mothers who do not smoke when they were under the encouragement intervention, but would smoke if they were under the usual care.
  • Defiers are mothers who do not smoke when they are under the usual care, but smokes when they are under the encouragement intervention.
  • Never-takers are mothers who always smoke irrespective of whether they were under the encouragement intervention or the usual care.

Assumption (IV6) rules out the existence of defiers in the study population, i.e. individuals who would not take the treatment if randomly assigned to the treatment, but take the treatment if randomly assigned to the control.

Also, we cannot classify everyone in the study population as always-takers, compliers, and never-takers from the observed data

  • Why? Because this requires observing both \(A(1)\) and \(A(0)\), which is not possible from (IV1, SUTVA).
  • But, as discussed above, we can identify the means \(\mathbb{E}[A(1)]\) and \(\mathbb{E}[A(0)]\) from (IV1)-(IV3): \[ \mathbb{E}[A(1)] = \mathbb{E}[A \mid Z=1], \quad{} \mathbb{E}[A(0)] = \mathbb{E}[A \mid Z=0]\]
  • Under the compliance type framework, these means can be interpreted as follows.
    • \(\mathbb{E}[A(1)] = \mathbb{P}(A(1) = 1)\) represents the proportion of always-takers and compliers as they both have \(A(1)=1\).
    • \(\mathbb{E}[A(0)] = \mathbb{P}(A(0) = 1)\) represents the proportion of always-takers and defiers as they both have \(A(0) =0\)
    • With (IV1)-(IV3), we can identify the proportion of mixtures of subgroups.

Some implications of (IV6) include

  • identifying the proportion of always-takers via \(\mathbb{E}[A(0)] = \mathbb{E}[A \mid Z=0]\).
  • identifying the proportion of compliers via \(\mathbb{E}[A(1)-A(0)] = \mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0]\).
  • identifying the proportion of never-takers via \(1 - \mathbb{E}[A \mid Z=1]\)
  • Note that the proportion of always-takers, compliers, and never-takers have to sum to \(1\).

This concept of dividing up the population into sub-types based on the joint distribution of the post-treatment variables (e.g., \(A(1)\) and \(A(0)\)) is sometimes referred to as principal stratification (Frangakis and Rubin (2002)).

One-Sided, Randomized Encouragement Designs

In some experimental designs, we can enforce (IV6) by blocking access to treatment for all individuals who are randomized to the control \(Z=0\), i.e.,

  • (IV6.One, One-Sided Noncompliance): \(A(0) = 0\)

One-sided non-compliance is plausible when \(Z\) represents a new program under evaluation and \(A\) represents the actual enrollment into the new program.

  • In these settings, those who are not randomized into the new program (i.e., \(Z=0\)) usually cannot enroll into the new program (i.e., \(A=0\)).
  • But, those who are randomized into the new program (i.e., \(Z=1\)) can choose to enroll (i.e \(A=1\)) or not enroll (i.e., \(A=0\)) into the program.

Note that (IV6.One) implies (IV6).

Causal Estimand: The Local Average Treatment Effect (LATE)

Under (IV1)-(IV6), we can identify the average treatment effect among the compliers.

  • This quantity is sometimes referred to the local average treatment effect (LATE) (Imbens and Angrist (1994),Angrist, Imbens, and Rubin (1996)). \[ {\rm LATE} = \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=0}_{\text{Compliers}}] \]
  • In the maternal smoking example, the \({\rm LATE}\) is the average causal effect of smoking during pregnancy on newborn’s birth weight among complying mothers (i.e. mothers who stop smoking if they were under the encouragement intervention, but smoke if they were under the usual care intervention).

The LATE is not the same as the ATE \(\mathbb{E}[Y(1)-Y(0)]\), which represents the average causal effect of smoking during pregnancy on newborn’s birth weight among all mothers.

The complier effect also differs from other “local” effects, such as the average causal effect of smoking on newborn’s birth weight among never-takers: \(\mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=0, A(0)=0}_{\text{Never-takers}}]\)

From the discussion above, we cannot use the observed data to classify all individuals into the four sub-types (i.e. compliers, always-takers, and never-takers).

  • In other words, LATE identifies the average treatment effect among a subgroup of individuals that are defined by latent classes.
  • This is in contrast to the conditional average treatment effect (CATE, \(\mathbb{E}[Y(1)-Y(0) \mid X=x]\)), which identifies the average treatment effect among a subgroup of individuals that are defined by observed \(X\).

Because the subgroup of individuals are impossible to identify from the data, there is a healthy debate about whether the LATE is a useful estimand:

  • Some references: M. A. Hernán and Robins (2006),Deaton (2010),Imbens (2010), Imbens (2014),Baiocchi, Cheng, and Small (2014),Swanson and Hernán (2014)
  • I personally think the identification of the LATE provides one clear illustration about the difficulty of studying the average treatment effect when strong ignorability fails to hold.

Suppose we have some pre-instrument covarites \(X\), for instance description of mothers in the randomized encouragement design on smoking cessation above. In addition to studying the LATE, it may be interesting to study the conditional local average treatment effect, denoted as \({\rm LATE}(x)\) and define as

\[ {\rm LATE}(x) = \mathbb{E}[Y(1) - Y(0) \mid A(1)=1, A(0) = 0, X=x] \] In the context of the smoking example above

  • Suppose \(X\) only contains the mother’s age. \({\rm LATE}(x)\) can represent the effect of smoking during pregnancy among complying mothers who are \(x\) years old.
  • \({\rm LATE}\) is the average of \({\rm LATE}(x)\) over the distribution of \(X\) among compliers

Why study \({\rm LATE}(x)\)?

  • ATE provides a coarse description of the treatment effect, masking away individuals who may actually benefit from treatment with those who are harmed from treatment by averaging their benefits/harms,
  • Similarly, the LATE masks away compliers who may benefit from treatment from those who may be harmed by treatment.
  • See Johnson, Cao, and Kang (2022) for a real data motivating example.

Finally, using the same strategy to identify \({\rm LATE}\), we can identify \({\rm LATE}(x)\) via \[ {\rm LATE}(x) = \frac{\mathbb{E}[Y \mid Z=1,X=x] - \mathbb{E}[Y \mid Z=0,X=x]}{\mathbb{E}[A \mid Z=1,X=x] - \mathbb{E}[A \mid Z=0,X=x]} \] Also, Section 3 of Wang and Tchetgen Tchetgen (2018) and Theorem 3.1 of Abadie (2003) showed the second part of the equality below. \[ {\rm LATE} = \mathbb{E}_{X \mid A(1)=1, A(0)=0}[{\rm LATE}(X)] = \frac{\mathbb{E}[\mathbb{E}[Y \mid Z=1,X] - \mathbb{E}[Y \mid Z=0,X]]}{\mathbb{E}[\mathbb{E}[A \mid Z=1,X] - \mathbb{E}[A \mid Z=0,X]]} \]

Proof of Identification of the LATE with (IV1)-(IV6)

We will show that under (IV1)-(IV6), we have

\[\begin{align*} {\rm LATE} &= \mathbb{E}[ Y(1) - Y(0) | A(1) - A(0) = 1] \\ &= \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \end{align*}\]

We first begin with the numerator of the above ratio. \[\begin{align*} &\mathbb{E}[Y \mid Z=1] \\ =& \mathbb{E}[Z Y(A(1),1) + (1-Z) Y(A(0),0) \mid Z=1] && \text{(IV1, SUTVA)} \\ =& \mathbb{E}[Y(A(1),1)\mid Z= 1] && \\ =& \mathbb{E}[Y(1,1)A(1) + Y(0,1)(1-A(1)) \mid Z=1] && \\ =& \mathbb{E}[Y(1,1)A(1) + Y(0,1)(1-A(1))] && \text{(IV2, Ignorable $Z$)} \\ =& \mathbb{E}[Y(1)A(1) + Y(0)(1-A(1))] && \text{(IV5, Exclusion restriction)} \end{align*}\] Note that (IV3, Positivity of \(Z\)) is needed to ensure that the conditional expectation that conditions on \(\{Z=1\}\) is well-defined. By a similar argument, we have \(\mathbb{E}[Y \mid Z=0] = \mathbb{E}[Y(1)A(0) + Y(0)(1-A(0))]\).

Second, we take the difference between the two expectations of \(\mathbb{E}[Y \mid Z=1]\) and \(\mathbb{E}[Y \mid Z= 0]\), we get \[\begin{align*} &\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0] \\ =& \mathbb{E}[ \{Y(1)A(1) + Y(0)(1-A(1))\} - \{Y(1)A(0) + Y(0)(1-A(0))\}] \\ =&\mathbb{E}[Y(1)\{A(1) - A(0)\} - Y(0)\{A(1) - A(0)\}] \\ =&\mathbb{E}[\{Y(1) - Y(0)\}\{A(1) - A(0)\}] \\ =& \mathbb{E}[\{Y(1) - Y(0)\} I(A(1) - A(0) = 1) + \{Y(1) - Y(0)\} I(A(1) - A(0) = -1) ] \\ =&\mathbb{E}[Y(1) - Y(0) | A(1) - A(0) = 1] \mathbb{P}(A(1) - A(0) = 1) && \text{(IV6, Monotonicity)} \end{align*}\] The last equality also uses the definition of conditional expectation.

Third, we can rewrite the denominator of the ratio above as follows: \[\begin{align*} &\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] \\ =&\mathbb{E}[A(1) - A(0)] && \text{(IV1)-(IV3)} \\ =&\mathbb{P}(A(1) - A(0) = 1) && \text{(IV6)}. \end{align*}\]

Finally, under (IV4, Instrument relevance), we can take the ratio of the two differences and the denominator of this ratio is non-zero: \[\begin{align*} &\frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \\ =& \frac{\mathbb{E}[ Y(1) - Y(0) | A(1) - A(0) = 1] \mathbb{P}(A(1) - A(0) = 1) }{\mathbb{P}(A(1) - A(0) = 1)} \\ =&\mathbb{E}[ Y(1) - Y(0) | A(1) - A(0) = 1] \end{align*}\]

Suppose all the local effects are identical across the subgroups, i.e., \[\begin{align*} &\mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=0, A(0)=0}_{\text{Never-takers}}] \\ =& \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=0}_{\text{Compliers}}] \\ =& \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=1}_{\text{Always-takers}}] \\ =& \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=0, A(0)=1}_{\text{Defiers}}], \end{align*}\]

Then, we can identify the ATE with one of the local effects: \[\begin{align*} &\mathbb{E}[Y(1)- Y(0)] \\ =& \sum_{a_1 \in \{0,1\}, a_0 \in \{0,1\}} \mathbb{E}[\mathbb{E}[Y(1)-Y(0) \mid A(1) = a_1, A(0)=a_0]] \mathbb{P}(A(1)=a_1,A(0)=a_0) \\ =& \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=0}_{\text{Compliers}}] \sum_{a_1 \in \{0,1\}, a_0 \in \{0,1\}} \mathbb{P}(A(1)=a_1,A(0)=a_0) \\ =& \mathbb{E}[Y(1) - Y(0) \mid \underbrace{A(1)=1, A(0)=0}_{\text{Compliers}}]. \end{align*}\]

In other words, if the causal effect is homogeneous across the four subgroups, the average effect for a subgroup equals the average effect for the entire population.

Instrument Under the No-Additive Interaction Assumption

As seen above, one way to identify causal effects of treatment \(A\) when it does not satisfy (A2) is by restricting the heterogeneity of the treatment effect across latent/unobservable variables.

  • Under the monotonicity framework, we “restricted” treatment effect heterogeneity in the latent space by simply removing defiers (i.e., (IV6)).
  • By assuming away defiers, the causal effect of treatment \(A\) “varies” less in the latent/unobservable space (i.e., among compliers, never-takers, and always-takers).

In a separate line of work by Robins (1994) (see M. A. Hernán and Robins (2006) and Wang and Tchetgen Tchetgen (2018) for more refined versions), an instrument was defined to restrict treatment effect heterogeneity in the latent space through the no additive interaction assumption.

  • As you’ll see below, the same ratio that identified the LATE also identifies the average treatment effect on the treated (ATT) if an instrument is defined in another way that restricts treatment effect heterogeneity.
  • Also, to the best of my knowledge, these set of assumptions were not motivated by a real experimental design like the monotonicity-based framework.

Roughly speaking, the no additive interaction framework assumes the following:

  • (IV1’, Causal consistency): \(Y = Y(A,Z)\)
  • (IV2’, Exhcangeable instrument): \(Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0)\)
  • (IV3, Positivity): \(0 < \mathbb{P}(Z=1) <1\)
  • (IV4’, Instrument relevance): \(\mathbb{E}[A \mid Z=1] \neq \mathbb{E}[A \mid Z=0]\)
  • (IV5, Exclusion restriction) \(Y(a,1)=Y(a,0)=Y(a)\) for all \(a\)
  • (IV6’, No additive interaction) Suppose (IV5’) holds. We have \(\mathbb{E}[Y(1) - Y(0) | Z=1, A=1] = \mathbb{E}[Y(1) - Y(0) | Z=0, A=1]\). Note that there is an implicit assumption that \(0 < \mathbb{P}(Z=z,A=1) < 1\) for all \(z\).

Similar to (IV2’) and (IV3’) above, we can create conditional versions of (IV2’) and (IV3’) that conditions on \(X\): \[Z \perp Y(1,1), Y(1,0), Y(0,1), Y(0,0) \mid X, \quad{} 0 < \mathbb{P}(Z=1 \mid X=x) <1 \text{ for all $x$} \]

Comparison Between Monotonicity-Based Framework and the No-Additive Interaction Framework

Compared to the monotonicity-based approach, the no-additive interaction framework makes different assumptions about the instrument.

  • The framework does not assume the existence of counterfactuals \(A(1), A(0)\).
  • Assumption (IV1’) and (IV2’) are similar to (IV1) and (IV2), except that assumptions about the counterfactuals \(A(1), A(0)\) are no longer present. If we assume \(A(z)\) exists, then (IV1) implies (IV1’) and (IV2) implies (IV2’).
  • Assumption (IV4’) states that the instrument is associated with \(A\). In contrast to (IV4), we do not necessarily need to have a causal effect of \(Z\) on \(A\).

Only assumptions (IV3) and (IV5) are equivalent between the the monotonicity-based framework and the no-additive interaction framework.

Interpreting the No-Additive Interaction Assumption (IV6’)

Assumption (IV6’) can be interpreted by writing out a saturated model of the causal effect of treamtent \(A\) on the outcome:

\[\mathbb{E}[Y(1) -Y(0) \mid Z=z,A=1] = \beta_{0} + \beta_{1}z.\]

  • A saturated model simply means that all of the variations on the left-hand side of the equality (i.e. the conditional expectation) can be explained by the model on the right-hand side of the equality.
  • The term \(\beta_0\) represents the ATT among individuals with \(Z=0\) and the term \(\beta_0 + \beta_1\) represents the ATT among individuals with \(Z=1\).

Then, assumption (IV6’) implies \(\beta_1 = 0\).

  • In other words, the no additive interaction effect says that the “ATT effect” (i.e., the average difference of \(Y(1) - Y(0)\) conditional on treated individuals \(A=1\)) is the same among individuals with \(Z =0\) and \(Z=1\).
  • Note that (IV6’) only restricts the effect of \(Z\) on the outcome conditional on \(A=1\).
  • For example, even under (IV6’), it’s possible that \(\mathbb{E}[Y(1) - Y(0) \mid Z=1] \neq \mathbb{E}[Y(1) - Y(0) \mid Z=0]\)

In the context of the maternal smoking example, (IV6’) states that:

  • The effect of smoking on the infant’s birth weight among mothers that smoked during pregnancy (i.e., \(A = 1\)) is the same between mother under the encouragement intervention (i.e., \(Z = 1\)) and mothers under the usual care (i.e., \(Z = 0\)).

While subtle, the exclusion restriction (IV5) \(Y(a,1)=Y(a,0)=Y(a)\) for all \(a\) is different than then no additive interaction assumption (IV6’) \(\mathbb{E}[Y(1) - Y(0) | Z=1, A=1] = \mathbb{E}[Y(1) - Y(0) | Z=0, A=1]\).

  • (IV5) makes assumptions about counterfactuals only whereas (IV6’) makes assumptions about the counterfactuals and the observables \(Z\) and \(A\).
  • (IV5) makes assumptions about the counterfactual outcomes of everyone whereas (IV6’) only makes an assumption about the average effects.
  • (IV6’) implicitly restricts the causal effect of treatment receipt \(A\) on the outcome \(Y\) whereas (IV5) leaves the contrast \(Y(1) - Y(0)\) unrestricted. In other words, assuming (IV5) does not necessarily mean that (IV6’) holds.

Consider the saturated model of \(\mathbb{E}[Y(1) - Y(0) | Z=z,A=a]\), or effectively a two-way ANOVA model with interactions.

\[ \mathbb{E}[Y(1) - Y(0) | Z=z,A=a] = \beta_0 + \beta_1 z + \beta_2 a + \beta_3 z a \]

(IV6’) states that

\[\begin{align*} \mathbb{E}[Y(1) - Y(0) | Z=1,A=1] &= \mathbb{E}[Y(1) - Y(0) | Z=0,A=1] \Rightarrow \\ \beta_0 + \beta_1 + \beta_2 + \beta_3 &= \beta_0 + \beta_2 \Rightarrow \\ \beta_1 + \beta_3 &= 0 \end{align*}\]

In other words, if the main effect of the instrument \(Z\) is zero (i.e., \(\beta_1 =0\)), (IV6’) implies that the interaction effect must be zero (i.e., \(\beta_3 = 0\)).

However, if the main effect of the instrument \(Z\) is non-zero (i.e., \(\beta_1 \neq 0\)), (IV6’) implies that the interaction effect must be identical in magnitude of the main effect, but opposite in sign (i.e., \(\beta_1 = -\beta_3\)).

Proof of Identification of the ATT with (IV1’)-(IV6’)

Now, we are ready to show that the ratio that was used to identify the LATE can also identify the ATT under (IV1’)-(IV6’): \[ {\rm ATT} = \mathbb{E}[Y(1) - Y(0) \mid A=1] = \frac{\mathbb{E}[Y \mid Z=1] - \mathbb{E}[Y \mid Z=0]}{\mathbb{E}[A \mid Z=1] - \mathbb{E}[A \mid Z=0] } \] We begin with the numerator of this ratio. \[\begin{align*} &\mathbb{E}[Y \mid Z=z] \\ =& \mathbb{E}[Y(A,Z) \mid Z =z] && \text{(IV1', Consistency)} \\ =& \mathbb{E}[Y(A) \mid Z=z] && \text{(IV5, Exclusion restriction)} \\ =& \mathbb{E}[Y(1) A + Y(0)(1-A) \mid Z=z] \\ =& \mathbb{E}[ (Y(1) - Y(0)) A \mid Z=z] + \mathbb{E}[Y(0) \mid Z=z] \\ =& \mathbb{E}[ (Y(1) - Y(0)) A \mid Z=z] + \mathbb{E}[Y(0)] && \text{(IV2', Exchangeable instrument)} \\ =& \mathbb{E}[Y(1) - Y(0) \mid Z=z,A=1]\mathbb{P}(A=1\mid Z=z) + \mathbb{E}[Y(0)] \end{align*}\] Note that assumption (IV3) is used to have a well-defined conditional event \(\{\mid Z=z\}\).

Taking the difference \(\mathbb{E}[Y \mid Z=1]- \mathbb{E}[Y \mid Z=0]\) yields \[\begin{align*} &\mathbb{E}[Y \mid Z=1]- \mathbb{E}[Y \mid Z=0] \\ =& \mathbb{E}[Y(1) - Y(0) \mid Z=1,A=1]\mathbb{P}(A=1\mid Z=1) \\ &\quad{} - \mathbb{E}[Y(1) - Y(0) \mid Z=0,A=1]\mathbb{P}(A=1\mid Z=0) \\ =& \mathbb{E}[Y(1) - Y(0) \mid A=1]\left(\mathbb{P}(A=1 \mid Z=1) - \mathbb{P}(A=1 \mid Z=0)\right) && \text{(IV6', No additive interaction)} \end{align*}\] The last equality utilizes the fact that (IV6’) implies \(\mathbb{E}[Y(1) -Y(0) \mid Z=1,A=1] =\mathbb{E}[Y(1) -Y(0) \mid Z=0,A=1] = \mathbb{E}[Y(1) -Y(0) \mid A=1]\).

Dividing the above expression by \(\mathbb{P}(A=1 \mid Z=1) - \mathbb{P}(A=1 \mid Z=0)\), which must be non-zero by assumption (IV4’, Instrument relevance) gives us the desired result.

Recent works have relaxed (IV6’) to allow identification of the ATT (or the ATE); see Wang and Tchetgen Tchetgen (2018) and Cui and Tchetgen Tchetgen (2021).

References

Abadie, Alberto. 2003. “Semiparametric Instrumental Variable Estimation of Treatment Response Models.” Journal of Econometrics 113 (2): 231–63.
Angrist, Joshua D, Guido W Imbens, and Donald B Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444–55.
Baiocchi, Michael, Jing Cheng, and Dylan S Small. 2014. “Instrumental Variable Methods for Causal Inference.” Statistics in Medicine 33 (13): 2297–2340.
Balke, Alexander, and Judea Pearl. 1997. “Bounds on Treatment Effects from Studies with Imperfect Compliance.” Journal of the American Statistical Association 92 (439): 1171–76.
Cui, Yifan, and Eric Tchetgen Tchetgen. 2021. “A Semiparametric Instrumental Variable Approach to Optimal Treatment Regimes Under Endogeneity.” Journal of the American Statistical Association 116 (533): 162–73.
Deaton, Angus. 2010. “Instruments, Randomization, and Learning about Development.” Journal of Economic Literature 48 (2): 424–55.
Frangakis, Constantine E, and Donald B Rubin. 2002. “Principal Stratification in Causal Inference.” Biometrics 58 (1): 21–29.
Hernán, Miguel A, and James M Robins. 2006. “Instruments for Causal Inference: An Epidemiologist’s Dream?” Epidemiology 17 (4): 360–72.
Hernán, Miguel, and James Robins. 2020. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
Imbens, Guido W. 2010. “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009).” Journal of Economic Literature 48 (2): 399–423.
———. 2014. “Instrumental Variables: An Econometrician’s Perspective.” Statistical Science 29 (3): 323–58.
Imbens, Guido W, and Joshua D Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62 (2): 467–75.
Johnson, Michael, Jiongyi Cao, and Hyunseung Kang. 2022. “Detecting Heterogeneous Treatment Effects with Instrumental Variables and Application to the Oregon Health Insurance Experiment.” The Annals of Applied Statistics 16 (2): 1111–29.
Robins, James M. 1994. “Correcting for Non-Compliance in Randomized Trials Using Structural Nested Mean Models.” Communications in Statistics-Theory and Methods 23 (8): 2379–2412.
Rosenbaum, Paul, and Donald Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.
Sexton, Mary, and J Richard Hebel. 1984. “A Clinical Trial of Change in Maternal Smoking and Its Effect on Birth Weight.” Jama 251 (7): 911–15.
Swanson, Sonja A, and Miguel A Hernán. 2014. “Think Globally, Act Globally: An Epidemiologist’s Perspective on Instrumental Variable Estimation.” Statistical Science 29 (3): 371–74.
Wang, Linbo, James M Robins, and Thomas S Richardson. 2017. “On Falsification of the Binary Instrumental Variable Model.” Biometrika 104 (1): 229–36.
Wang, Linbo, and Eric Tchetgen Tchetgen. 2018. “Bounded, Efficient and Multiply Robust Estimation of Average Treatment Effects Using Instrumental Variables.” Journal of the Royal Statistical Society Series B: Statistical Methodology 80 (3): 531–50.