Developing Standards for Post-Hoc Weighting in Population-Based Survey Experiments

Annie Franco; Neil Malhotra; Gabor Simonovits; L. J. Zigerell

doi:10.1017/XPS.2017.2

Developing Standards for Post-Hoc Weighting in Population-Based Survey Experiments

Published online by Cambridge University Press: 12 October 2017

Gabor Simonovits and

Annie Franco: Affiliation:
Department of Political Science, Stanford University, Stanford, CA, USA, e-mail: annniefranco@gmail.com
Neil Malhotra: Affiliation:
Graduate School of Business, Stanford University, Stanford, CA, USA, e-mail: neilm@stanford.edu
Gabor Simonovits: Affiliation:
Department of Politics, New York University, New York, NY, USA, e-mail: simonovits@nyu.edu
L. J. Zigerell: Affiliation:
Department of Politics and Government, Illinois State University, Normal, IL, USA, e-mail: ljzigerell@ilstu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Weighting techniques are employed to generalize results from survey experiments to populations of theoretical and substantive interest. Although weighting is often viewed as a second-order methodological issue, these adjustment methods invoke untestable assumptions about the nature of sample selection and potential heterogeneity in the treatment effect. Therefore, although weighting is a useful technique in estimating population quantities, it can introduce bias and also be used as a researcher degree of freedom. We review survey experiments published in three major journals from 2000–2015 and find that there are no standard operating procedures for weighting survey experiments. We argue that all survey experiments should report the sample average treatment effect (SATE). Researchers seeking to generalize to a broader population can weight to estimate the population average treatment effect (PATE), but should discuss the construction and application of weights in a detailed and transparent manner given the possibility that weighting can introduce bias.

Keywords

Survey experiment weighting external validity representativeness transparency SATE PATE

Type: Research Article
Information: Journal of Experimental Political Science , Volume 4 , Issue 2 , Summer 2017 , pp. 161 - 172

DOI: https://doi.org/10.1017/XPS.2017.2 [Opens in a new window]
Copyright: Copyright © The Experimental Research Section of the American Political Science Association 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Ansolabehere, S. and Schaffner, B. F.. 2014. “Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison.” Political Analysis 22 (3): 285–303.CrossRef Google Scholar

Baker, A. 2015. “Race, Paternalism, and Foreign Aid: Evidence from U.S. Public Opinion.” American Political Science Review 109 (1): 93–109.Google Scholar

Berinsky, A. J., Huber, G. A., and Lenz, G. S.. 2012. “Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk.” Political Analysis 20 (3): 351–368.Google Scholar

Bethlehem, J. G. 1988. “Reduction of Nonresponse Bias through Regression Estimation.” Journal of Official Statistics 4 (3): 251–260.Google Scholar

Brick, J. M. 2013. “Unit Nonresponse and Weighting Adjustments: A Critical Review.” Journal of Official Statistics 29 (3): 329–353.Google Scholar

Brick, J. M. and Williams, D.. 2013. “Explaining Rising Nonresponse Rates in Cross-Sectional Surveys.” The Annals of the American Academy of Political and Social Science 645 (1): 36–59.Google Scholar

Brick, J. M. and Kalton, G.. 1996. “Handling Missing Data in Survey Research.” Statistical Methods in Medical Research 5 (3): 215–238.Google Scholar

Casey, K., Glennerster, R., and Miguel, E.. 2012. “Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan.” Quarterly Journal of Economics 127 (4): 1755–1812.Google Scholar

Clinton, J. D. and Lapinski, J. S.. 2004. “‘Targeted’ Advertising and Voter Turnout: An Experimental Study of the 2000 Presidential Election.” Journal of Politics 66 (1): 69–96.Google Scholar

Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley & Sons.Google Scholar

Cole, S. R. and Stuart, E. A.. 2010. “Generalizing Evidence from Randomized Clinical Trials to Target Populations: The ACTG 320 Trial.” American Journal of Epidemiology 172 (1): 107–115.Google Scholar

Franco, A., Malhotra, N., and Simonovits, G.. 2015. “Underreporting in Political Science Survey Experiments: Comparing Questionnaires to Published Results.” Political Analysis 23 (2): 306–312.Google Scholar

Gelman, A. 2007. “Struggles with Survey Weighting and Regression Modeling.” Statistical Science 22 (2): 153–164.Google Scholar

Gerber, A., Arceneaux, K., Boudreau, C., Dowling, C., Hillygus, S., Palfrey, T., Biggers, D. R., and Hendry, D. J.. 2014. “Reporting Guidelines for Experimental Research: A Report from the Experimental Research Section Standards Committee.” Journal of Experimental Political Science 1 (1): 81–98.Google Scholar

Harbridge, L. and Malhotra, N.. 2011. “Electoral Incentives and Partisan Conflict in Congress: Evidence from Survey Experiments.” American Journal of Political Science 55 (3): 494–510.Google Scholar

Hartman, E., Grieve, R., Ramsahai, R., and Sekhon, J. S.. 2015. “From SATE to PATT: Combining Experimental with Observational Studies to Estimate Population Treatment Effects.” Journal of the Royal Statistical Society, Series A. (forthcoming). doi: 10, 1111.Google Scholar

Imai, K., King, G., and Stuart, E. A.. 2008. “Misunderstandings between Experimentalists and Observationalists about Causal Inference.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 171 (2): 481–502.Google Scholar

Kalton, G. and Maligalig, D. S.. 1991. A Comparison of Methods of Weighting Adjustment for Nonresponse, Proceedings of the 1991 Annual Research Conference, U.S. Bureau of the Census, 409–428.Google Scholar

Kalton, G. and Flores-Cervantes, I.. 2003. “Weighting Methods.” Journal of Official Statistics 19 (2): 81–97.Google Scholar

Kim, N., Krosnick, J., and Casasanto, D.. 2015. “Moderators of Candidate Name-Order Effects in Elections: An Experiment.” Political Psychology 36 (5): 525–42.Google Scholar

Little, R. J. A. and Rubin, D. B.. 2002. Statistical Analysis with Missing Data. New York: John Wiley & Sons.Google Scholar

Malhotra, N. and Krosnick, J. A.. 2007. “The Effect of Survey Mode and Sampling on Inferences about Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Samples.” Political Analysis 15 (3): 286–323.Google Scholar

Miratrix, L. W., Sekhon, J. S., and Theodoridis, A. G.. 2014. “Why You Should (Usually) Post-Stratify on Sample Weights in Survey Experiments.” Paper presented at the Annual Meeting of the Society for Political Methodology. Athens, GA.Google Scholar

Miratrix, L. W., Sekhon, J. S., and Yu, B.. 2013. “Adjusting Treatment Effect Estimates by Post-Stratification in Randomized Experiments.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75 (2): 369–396.Google Scholar

Mutz, D. C. 2011. Population-Based Survey Experiments. Princeton, NJ: Princeton University Press.Google Scholar

Shin, H. 2012. “A Cautionary Note on Post-Stratification Adjustment.” Paper presented at the Section on Survey Research Methods, Joint Statistical meeting (JSM). San Diego, CA.Google Scholar

Simmons, J. P., Nelson, L. D., and Simonsohn, U.. 2011. “False-Positive Psychology: Undisclosed Flexibility Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–1366.Google Scholar

Sniderman, P. M., Brody, R. A., and Tetlock, P. E.. 1991. Reasoning and Choice: Explorations in Political Psychology. Cambridge: Cambridge University Press.Google Scholar

Vavreck, L. and Rivers, D.. 2008. “The 2006 Cooperative Congressional Election Study.” Journal of Elections, Public Opinion and Parties 18 (4): 355–66.Google Scholar

Xenos, M. A., and Becker, A. B.. 2009. “Moments of Zen: Effects of The Daily Show on Information Seeking and Political Learning.” Political Communication 26 (3): 317–332.CrossRef Google Scholar

Yeager, D. S., Krosnick, J. A., Chang, L., Javitz, H. S., Levendusky, M. S., Simpser, A., and Wang, R.. 2011. “Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability Samples and Non-Probability Samples.” Public Opinion Quarterly 75 (4): 709–747.Google Scholar

Franco et al supplementary material

Online Appendix

PDF 786.4 KB

Article contents

Developing Standards for Post-Hoc Weighting in Population-Based Survey Experiments

Abstract

Keywords

Access options

References

REFERENCES

Franco et al supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests