## Appendix A. Proofs

## Proof of Corollary 1.

$\Pr [S(1)=S(2)=\cdots =S(K)]=1$
implies
$S=S(1)=S(2)=\cdots =S(K)$
, thus ensuring . Joint independence implies that
$\text{E}\,[Y(z)|S=1,Z=z]-\text{E}\,[Y(z^{\prime })|S=1,Z=z^{\prime }]=\text{E}\,[Y(z)|S(1)=1,S(2)=1,\ldots ,S(K)=1]-\text{E}\,[Y(z^{\prime })|S=1]$
.

## Proof of Corollary 3.

We prove the claim via a simple counterexample. Suppose
$\text{Supp}\,(Z)=\{1,2\}$
and

Note that
$\text{E}\,[S(2)-S(1)]=0$
and
$\text{E}\,[\unicode[STIX]{x1D70F}|S(1)=S(2)=1]=\text{E}\,[\unicode[STIX]{x1D70F}|S(1)=1,S(2)=0]=\text{E}\,[\unicode[STIX]{x1D70F}|S(1)=0,S(2)=1]=0$
.
$\text{E}\,[Y|S=1,Z=2]-\text{E}\,[Y|S=1,Z=1]=0-1/2=-1/2.$

## Proof of Proposition 1.

We will follow the general logic of Lee (2009), and technical details carry through from the proof of Lee’s Proposition 1a. Without loss of generality, we consider the upper bound for
$\text{E}\,[Y(1)|S(1)=S(2)=\cdots =S(K)=1]$
.

Define
$U=1$
if
$S(2)=\cdots =S(K)=1$
, else let
$U=0$
. Then
$\text{E}\,[Y(1)|S(1)=S(2)=\cdots =S(K)=1]=\text{E}\,[Y(1)|U=1,S(1)=1]$
. We do not observe the joint distribution of
$(Y(1),U)|S(1)=1$
, as we never jointly observe
$Y(z)$
and
$S(z^{\prime })$
, for
$z\neq z^{\prime }$
. Let
$p_{U^{0}}=\Pr [U=0|S(1)=1]$
. Given continuity of
$Y(1)$
, then among all possible joint distributions
$(Y(1),U)|S(1)=1$
,
$\text{E}\,[Y(1)|U=1,S(1)=1]$
is maximized when
$U=1$
for all
$Y(1)\geqslant Q_{Y(1)}(p_{U^{0}})$
. By weak monotonicity of the quantile function, it suffices to maximize
$p_{U^{0}}$
to find a maximum for
$\text{E}\,[Y(1)|U=1,S(1)=1]$
.

We again do not observe the joint distribution
$(U,S(1))$
. By
$\unicode[STIX]{x1D70E}$
-additivity, a sharp upper bound is obtained for
$\Pr [U=0]$
is obtained when the regions where
$S(2),S(3),\ldots ,S(K)$
each equal zero are disjoint, with

Thus, among all possible joint distributions
$(U,S(1))$
,
$\Pr [U=0|S(1)=1]=p_{U^{0}}$
is maximized when

Thus if
$\frac{\sum _{k=2}^{K}\Pr [S(k)=0]}{\Pr [S(1)=1]}<1$
, a sharp upper bound is given by
$\text{E}\,[Y(1)|U=1,S(1)=1]\leqslant \text{E}\,[Y(1)|Y(1)\leqslant Q_{Y(1)|S(1)=1}(1-\sum _{k=2}^{K}\frac{\Pr [S(k)=0]}{\Pr [S(1)=1]})]$
, else the upper bound is infinite.

By random assignment and SUTVA, the conditional distribution of
$Y(1)|S(1)=1$
is equivalent to the conditional distribution of
$Y|S=1,Z=1$
, and the marginal distributions of
$S(k)$
are each equivalent to
$S|Z=k$
. Thus a sharp upper bound is given by
$\text{E}\,[Y(1)|U=1,S(1)=1]\leqslant \text{E}\,[Y|Y\geqslant Q_{Y|Z=1,S=1}(\sum _{k=2}^{K}\frac{\Pr [S=0|Z=k]}{\Pr [S=1|Z=k]}),Z=1]$
when
$\sum _{k=2}^{K}\frac{\Pr [S=0|Z=k]}{\Pr [S=1|Z=k]}<1$
, else the upper bound is infinite. The bounds are invariant to indexing of treatments
$Z$
, thus yielding the general upper bound in Proposition 1. Analogous calculations yield lower bounds.

## Appendix B. Details of Replication of Press, Sagan, and Valentino (2013)

Our replication and preanalysis plan are hosted at EGAP (ID: 20150131AA). Our replication included three major variations, the analysis of which underlines the robustness of PSV. We list these analyses in turn below.

First, because the original experiment was performed prior to the onset of the Syrian civil war, we sought to assess whether the results were invariant to shifts in time and context (i.e., whether the results might differ in our replication, given the political changes that have occurred in Syria). We thus randomized whether treatment frames presented the scenario in Syria or Lebanon, which was used as an analog to pre-civil-war Syria; treatments were assigned through a
$2\times 5$
factorial design. We found no statistically or substantively significant difference between Syria and Lebanon treatment frames, demonstrating that the results presented in PSV are robust to these temporal and contextual changes.

Second, we analyzed whether the PSV study’s use of posttreatment covariates introduced bias. We added another treatment (rendering our augmented replication a
$2\times 2\times 5$
factorial design) that randomized whether subjects answered these questions before or after treatment. This analysis failed to reveal any statistically or substantively significant results.

Third, as noted above, we performed weighting on our survey sample to approximate the experimental population used by PSV. Our subjects were recruited from Mechanical Turk, and likely constituted an unrepresentative sample. As noted in the main text, we used logistic regression and IPW to assign treatment probabilities and corresponding weights for each subject. We did observe differences between the weighted and unweighted analyses, but neither undermined the substantive findings of PSV.

## Appendix C. Simulations

We assume a treatment
$Z$
with
$\text{Supp}\,(Z)=\{1,2\}$
and
$\Pr (Z=1)=1/2$
. We generated potential outcomes
$Y(1)=Y(2)=\unicode[STIX]{x1D706}[S(2)-S(1)]+N(0,\unicode[STIX]{x1D70E})$
, and vary
$\unicode[STIX]{x1D706}$
,
$\unicode[STIX]{x1D70E}$
, and the joint distribution of
$(S(1),S(2))$
. Note that in the simulation, we have assumed that there is no effect of the treatment whatsoever, and the results would be invariant to the introduction of any constant treatment effect.
$\unicode[STIX]{x1D706}$
represents the divergence in potential outcomes between those who would pass and those who would fail the manipulation check and
$\unicode[STIX]{x1D70E}$
represents the unexplained variability of potential outcomes. To put our results in asymptopia, we assume
$N=1,000$
, and perform
$100,000$
simulations.

Table 1 presents the results of our simulations. We first discuss the bias of the difference-in-means estimator after dropping subjects. We show that bias tends to increase as
$\unicode[STIX]{x1D706}$
—the divergence between the average potential outcomes of subjects who would pass the control manipulation check and that of those who would pass the treatment manipulation check—increases. See, e.g., row 1 vs. 2. As failure rates increase, not necessarily differentially across treatment arms, we also see that bias increases; compare rows 1–4 to 5–8 to 9–12. Furthermore, as
$\unicode[STIX]{x1D70C}(S(1),S(2))$
—the correlation between potential responses to the manipulation check—decreases, bias also increases, as evidenced by, e.g., row 4 vs. row 1.

The width of the bounds also depends on multiple factors. As the variability of potential outcomes increases (characterized by
$\unicode[STIX]{x1D70E}$
, and to a lesser extent
$\unicode[STIX]{x1D706}$
), the width of the bounds increases, as evidenced by comparing, e.g., row 1 vs. 2 vs. 3. The width of the bounds also depends on failure rates; again compare rows 1–4 to 5–8 to 9–12. The bounds do not depend on any unobservable features of the joint distributions of potential outcomes and responses to the manipulation check. To wit, the width of the bounds does not change as
$\unicode[STIX]{x1D70C}(S(1),S(2))$
is varied; compare, e.g., row 1 to row 4.

Table 1. Simulations demonstrating the effects of dropping. Simulations performed with
$N=1,000$
and
$100,000$
simulations; bound widths are presented as averages over all simulations.

## Appendix D. Additional (Weighted) Summary Statistics

Below, we present distributions of the reweighted covariate profiles of subjects in our replication study, disaggregated by treatment condition and performance on the manipulation checks.

Table 2. Weighted covariate distributions among subjects who failed the manipulation check.

Table 3. Weighted covariate distributions among subjects who passed the manipulation check.

Table 4. Weighted covariate distributions for all subjects.

## Appendix E. Additional (Unweighted) Summary Statistics

Below, we present distributions of the unweighted covariate profiles of subjects in our replication study, disaggregated by treatment condition and performance on the manipulation checks.

Figure 2. Unweighted Results from Press, Sagan, and Valentino (2013) and Replication. Comparisons of original and unweighted replication data. Panel A presents results from PSV with subjects dropped; Panel B presents results from the replication with subjects dropped; Panel C presents results from the replication using the full sample; Panel D presents results imputing the lower bounds for all treatment conditions; Panel E presents results imputing the upper bounds for all treatment conditions. Vertical bars represent 95% confidence intervals on point estimates calculated using the bootstrap.

Table 5. Unweighted covariate distributions among subjects who failed the manipulation check.

Table 6. Unweighted covariate distributions among subjects who passed the manipulation check.

Table 7. Unweighted covariate distributions for all subjects.

## References

Angrist, J. D., Imbens, G. W., and Rubin, D. B.. 1996. “Identification of Causal Effects using Instrumental Variables.”
*Journal of the American Statistical Association*
91(434):444–455.

Aronow, P. M., Baron, J., and Pinson, L.. 2018 “Replication Data for: A Note on Dropping Experimental Subjects who Fail a Manipulation Check.” https://doi.org/10.7910/DVN/GXXYMH, Harvard Dataverse, V1.
Berinsky, A. J., Margolis, M. F., and Sances, M. W.. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.”
*American Journal of Political Science*
58(3):739–753.

Crawford, J. T., Brady, J. L., Pilanski, J. M., and Erny, H.. 2013. “Differential Effects of Right-Wing Authoritarianism and Social Dominance Orientation on Political Candidate Support: The Moderating Role of Message Framing.”
*Journal of Social and Political Psychology*
1(1):5–28.

De Oliveira, P., Guimond, S., and Dambrun, M.. 2012. “Power and Legitimizing Ideologies in Hierarchy-Enhancing versus Hierarchy-Attenuating Environments.”
*Political Psychology*
33(6):867–885.

Gerber, A. S., and Green, D. P.. 2012.
*Field Experiments: Design, Analysis, and Interpretation*
. WW Norton.

Hauser, D. J., and Schwarz, N.. 2016. “Attentive Turkers: MTurk Participants Perform Better on Online Attention Checks than do Subject Pool Participants.”
*Behavior Research Methods*
48(1):400–407.

Hoffman, A. M., Agnew, C. R., VanderDrift, L. E., and Kulzick, R.. 2013. “Norms, Diplomatic Alternatives, and the Social Psychology of War Support.”
*Journal of Conflict Resolution*
59(1):3–28.

Lee, D. S.
2009. “Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.”
*The Review of Economic Studies*
76(3):1071–1102.

Maoz, I.
2006, 10. “The Effect of News Coverage Concerning the Opponents’ Reaction to a Concession on its Evaluation in the Israeli–Palestinian Conflict.”
*The Harvard International Journal of Press/Politics*
11(4):70–88.

Oppenheimer, D. M., Meyvis, T., and Davidenko, N.. 2009. “Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power.”
*Journal of Experimental Social Psychology*
45(4):867–872.

Peer, E., Vosgerau, J., and Acquisti, A.. 2014. “Reputation as a Sufficient Condition for Data Quality on Amazon Mechanical Turk.”
*Behavior Research Methods*
46(4):1023–1031.

Press, D. G., Sagan, S. D., and Valentino, B. A.. 2013. “Atomic Aversion: Experimental Evidence on Taboos, Traditions, and the Non-Use of Nuclear Weapons.”
*American Political Science Review*
107(01):188–206.

Rubin, D. B.
1980. “Comment.”
*Journal of the American Statistical Association*
75(371):591–593.

Small, D. A., Lerner, J. S., and Fischhoff, B.. 2006. “Emotion Priming and Attributions for Terrorism: Americans’ Reactions in a National Field Experiment.”
*Political Psychology*
27(2):289–298.

Turner, J.
2007. “The Messenger Overwhelming the Message: Ideological Cues and Perceptions of Bias in Television News.”
*Political Behavior*
29(4):441–464.

Wilson, T. D., Aronson, E., and Carlsmith, K.. 2010. “The Art of Laboratory Experimentation.” In
*Handbook of Social Psychology*
, 5th edn, edited by Fiske, S. T., Gilbert, D. T., Lindzey, G., and Jongsma, A. E., 51–81. Hoboken, NJ: Wiley.

Zhang, J. L., and Rubin, D. B.. 2003. “Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by ‘Death.’”
*Journal of Educational and Behavioral Statistics*
28(4):353–368.

1 The supplementary information provides an extensive bibliography of studies and dissertations that drop or otherwise statistically condition on posttreatment manipulation checks. Articles were found using a series of searches on Google Scholar for “experiment manipulation check,” “experiment manipulation attention check,” “experiment manipulation attention check political science,” “‘manipulation check’ ‘attention check’ screen*,” “political science ‘manipulation check’ ‘attention check’ screen*”; searches for dissertations were performed on ProQuest using “experiment manipulation attention check,” which was the most inclusive search on Google Scholar. Articles suspected to use manipulation or posttreatment attention checks as a statistical conditioning strategy were then coded independently by two readers. With the exception of dissertations, when either reader was unsure about how the study was conducted or the readers disagreed, the authors of the article (starting with the corresponding author) were e-mailed for clarification. We sent e-mails to authors regarding 42 articles, all of which received responses, and 28 of which were confirmed to drop subjects based on a manipulation (or other posttreatment) check.

2 For recent examples in political science, see Maoz (2006), Small, Lerner, and Fischhoff (2006), Turner (2007), De Oliveira, Guimond, and Dambrun (2012), Crawford *et al.* (2013), and Hoffman *et al.* (2013).

3 The point has been made before, but has not to our knowledge been formalized. For example, Gerber and Green (2012, p. 212) note that attrition may be induced when “[r]esearchers deliberately discard observations. Perhaps ill-advisedly, laboratory researchers sometimes exclude from their analysis subjects who seem not to understand the instructions or who fail to take the experimental situation seriously” but does not provide further discussion of this point.

4 Replication data are available from Aronow, Baron, and Pinson (2018).

5 We thank Ben Miller for helpful discussions regarding the formulation of Proposition 1.

6 Three subjects were omitted from analysis because of technical difficulties that prevented us from verifying that they completed the survey; 2,730 subjects are included in the analysis below.

7 Let
$R_{i}=1$
if an observation
$i$
is in the replication study, else let
$R_{i}=0$
if an observation was in the original data. We performed a logistic regression of
$R_{i}$
on the following covariates
$\mathbf{X}_{i}$
: Education, Party, Religion, Political Interest, Income, Gender, News Interest, Voter Registration, Birth Year, Region, Race, and Ideology (with mean-imputation for missingness). Using the output of this logistic regression, we computed a predicted value
$p_{i}=\Pr [R_{i}=1|\mathbf{X}_{i}]$
for each observation
$i$
. To reweight the replication study to the original study’s covariate profile, we weighted each observation in the replication sample by
$\frac{p_{i}}{1-p_{i}}$
.