Skip to main content Accessibility help
×
×
Home

Why Propensity Scores Should Not Be Used for Matching

  • Gary King (a1) and Richard Nielsen (a2)

Abstract

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

Copyright

Corresponding author

Footnotes

Hide All

Authors’ note: The current version of this paper, along with a Supplementary Appendix, can be found at j.mp/PScore. We thank Alberto Abadie, Alan Dafoe, Justin Grimmer, Jens Hainmueller, Chad Hazlett, Seth Hill, Stefano Iacus, Kosuke Imai, Simon Jackman, John Londregan, Adam Meirowitz, Giuseppe Porro, Molly Roberts, Jamie Robins, Bradley Spahn, Brandon Stewart, Liz Stuart, Chris Winship, and Yiqing Xu for helpful suggestions, and Connor Jerzak, Chris Lucas, Jason Sclar for superb research assistance. We also appreciate the insights from our collaborators on a previous related project, Carter Coberley, James E. Pope, and Aaron Wells. All data necessary to replicate the results in this article are available at Nielsen and King (2019).

Contributing Editor: Jeff Gill

Footnotes

References

Hide All
Abadie, A., and Imbens, G. W.. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74(1):235267.
Athey, S., and Imbens, G. W.. 2015. “A Measure of Robustness to Misspecification.” American Economic Review Papers and Proceedings 105(5):476480.
Austin, P. C. 2008. “A Critical Appraisal of Propensity-Score Matching in the Medical Literature Between 1996 and 2003.” Journal of the American Statistical Association 72:20372049.
Austin, P. C. 2009. “Some Methods of Propensity-Score Matching had Superior Performance to Others: Results of an Empirical Investigation and Monte Carlo Simulations.” Biometrical Journal 51(1):171184.
Banaji, M. R., and Greenwald, A. G.. 2016. Blindspot: Hidden Biases of Good People . New York: Bantam.
Bansal, P. P., and Ardell, A. J.. 1972. “Average Nearest-Neighbor Distances Between Uniformly Distributed Finite Particles.” Metallography 5(2):97111.
Barnow, B. S., Cain, G. G., and Goldberger, A. S.. 1980. “Issues in the Analysis of Selectivity Bias.” In Evaluation Studies, vol. 5 , edited by Stromsdorfer, E. and Farkas, G.. San Francisco: Sage.
Box, G. E. P., Hunter, W. G., and Hunter, J. S.. 1978. Statistics for Experimenters . New York: Wiley-Interscience.
Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J., and Sturmer, T.. 2006. “Variable Selection for Propensity Score Models.” American Journal of Epidemiology 163:11491156.
Caliendo, M., and Kopeinig, S.. 2008. “Some Practical Guidance for the Implementation of Propensity Score Matching.” Journal of Economic Surveys 22(1):3172.
Crump, R. K., Hotz, V. J., Imbens, G. W., and Mitnik, O.. 2009. “Dealing with Limited Overlap in Estimation of Average Treatment Effects.” Biometrika 96(1):187.
D’Augustino, R. B. 1998. “Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non-Randomized Control Group.” Statistics in Medicine 17:22652281.
Dehejia, R. 2004. “Estimating Causal Effects in Nonexpermental Studies.” In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives , edited by Gelman, A. and Meng, X.-L.. New York: Wiley.
Diamond, A., and Sekhon, J. S.. 2012. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics 95(3):932945.
Drake, C. 1993. “Effects of Misspecification of the Propensity Score on Estimators of Treatment Effects.” Biometrics 49:12311236.
Efron, B. 2014. “Estimation and Accuracy After Model Selection.” Journal of the American Statistical Association 109(507):9911007.
Finkel, S. E., Horowitz, J., and Rojo-Mendoza, R. T.. 2012. “Civic Education and Democratic Backsliding in the Wake of Kenya’s Post-2007 Election Violence.” Journal of Politics 74(01):5265.
Glazerman, S., Levy, D. M., and Myers, D.. 2003. “Nonexperimental Versus Experimental Estimates of Earnings Impacts.” The Annals of the American Academy of Political and Social Science 589:6393.
Greevy, R., Lu, B., Silver, J. H., and Rosenbaum, P. R.. 2004. “Optimal Multivariate Matching Before Randomization.” Biostatistics 5(2):263275.
Gu, X. S., and Rosenbaum, P. R.. 1993. “Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms.” Journal of Computational and Graphical Statistics 2:405420.
Heckman, J., Ichimura, H., and Todd, P.. 1998. “Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Program.” Review of Economic Studies 65:261294.
Hill, J. 2008. “Discussion of Research Using Propensity-Score Matching: Comments on “A Critical Appraisal of Propensity-Score Matching in the Medical Literature Between 1996 and 2003” by Peter Austin, Statistics in Medicine.” Statistics in Medicine 27(12):20552061.
Ho, D. E., Imai, K., King, G., and Stuart, E. A.. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15:199236. URL: j.mp/matchP.
Holland, P. W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81:945960.
Iacus, S. M., King, G., and Porro, G.. 2011. “Multivariate Matching Methods that are Monotonic Imbalance Bounding.” Journal of the American Statistical Association 106:345361. URL: j.mp/matchMIB.
Imai, K., King, G., and Nall, C.. 2009. “The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation.” Statistical Science 24(1):2953. URL: j.mp/essrole.
Imai, K., King, G., and Stuart, E. A.. 2008. “Misunderstandings Among Experimentalists and Observationalists about Causal Inference.” Journal of the Royal Statistical Society, Series A 171(2):481502. URL: j.mp/misunEO.
Imai, K., and Ratkovic, M.. 2014. “Covariate Balancing Propensity Score.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1):243263.
Imbens, G. W. 2004. “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review.” Review of Economics and Statistics 86(1):429.
Imbens, G. W., and Rubin, D. B.. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences An Introduction . New York: Cambridge University Press.
Ioannidis, J. P. A. 2005. “Why Most Published Research Findings are False.” PLoS Medicine 2(8):e124.
Kahneman, D. 2011. Thinking, Fast and Slow . London: Macmillan.
Kallus, N. 2018. “Optimal A Priori Balance in The Design of Controlled Experiments.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(1):85112.
Kang, J. D. Y., and Schafer, J. L.. 2007. “Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data.” Statistical Science 22(4):523539.
King, G., and Zeng, L.. 2006. “The Dangers of Extreme Counterfactuals.” Political Analysis 14(2):131159. URL: j.mp/dangerEC.
King, G., and Zeng, L.. 2007. “When Can History Be Our Guide? The Pitfalls of Counterfactual Inference.” International Studies Quarterly , 183210. URL: j.mp/pitfallsH.
Lechner, M. 2001. “Identification and Estimation of Causal Effects of Multiple Treatments under the Conditional Independence Assumption.” In Econometric Evaluation of Labour Market Policies , edited by Lechner, M. and Pfeiffer, F., 4358. Heidelberg: Physica.
Lunceford, J. K., and Davidian, M.. 2004. “Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study.” Statistics in Medicine 23(19):29372960.
Mahoney, M. J. 1977. “Publication Prejudices: An Experimental Study of Confirmatory Bias in the Peer Review System.” Cognitive Therapy and Research 1(2):161175.
Mielke, P., and Berry, K.. 2007. Permutation Methods: A Distance Function Approach . New York: Springer.
Morgan, S. L., and Winship, C.. 2014. Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2nd edn. Cambridge: Cambridge University Press.
Nielsen, R., Findley, M., Davis, Z., Candland, T., and Nielson, D.. 2011. “Foreign Aid Shocks as a Cause of Violent Armed Conflict.” American Journal of Political Science 55(2):219232.
Nielsen, R., and King, G.. 2019. “Replication Data for: Why Propensity Scores Should Not Be Used for Matching.” https://doi.org/10.7910/DVN/A9LZNV, Harvard Dataverse, V1.
Pearl, J.2009. “Myth, Confusion, and Science in Causal Analysis.” Unpublished paper, http://web.cs.ucla.edu/∼kaoru/r348.pdf.
Pearl, J. 2009. “The Foundations of Causal Inference.” Sociological Methodology 40(1):75149.
Peikes, D. N., Moreno, L., and Orzol, S. M.. 2008. “Propensity Score Matching.” The American Statistician 62(3):222231.
Pimentel, S. D., Page, L. C., Lenard, M., and Keele, L.. 2018. “Optimal Multilevel Matching Using Network Flows: An Application to a Summer Reading Intervention.” The Annals of Applied Statistics 12(3):14791505.
Robins, J. M., Hernan, M. A., and Brumback, B.. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology 11(5):550560.
Robins, J. M., and Morgenstern, H.. 1987. “The Foundations of Confounding in Epidemiology.” Computers & Mathematics with Applications 14(9):869916.
Rosenbaum, P. R., Ross, R., and Silber, J.. 2007. “Minimum Distance Matched Sampling With Fine Balance in an Observational Study of Treatment for Ovarian Cancer.” Journal of the American Statistical Association 102(477):7583.
Rosenbaum, P. R., and Rubin, D. B.. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70:4155.
Rosenbaum, P. R., and Rubin, D. B.. 1984. “Reducing Bias in Observational Studies Using Subclassification on the Propensity Score.” Journal of the American Statistical Association 79:515524.
Rosenbaum, P. R., and Rubin, D. B.. 1985a. “Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score.” The American Statistician 39:3338.
Rosenbaum, P. R., and Rubin, D. B.. 1985b. “The Bias Due to Incomplete Matching.” Biometrics 41(1):103116.
Rubin, D. B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 6:688701.
Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63:581592.
Rubin, D. B. 1980. “Comments on “Randomization Analysis of Experimental Data: The Fisher Randomization Test”, by D. Basu.” Journal of the American Statistical Association 75:591593.
Rubin, D. B. 2008a. “Comment: The Design and Analysis of Gold Standard Randomized Experiments.” Journal of the American Statistical Association 103(484):13501353.
Rubin, D. B. 2008b. “For Objective Causal Inference, Design Trumps Analysis.” Annals of Applied Statistics 2(3):808840.
Rubin, D. B. 2009. “Should Observational Studies be Designed to Allow Lack of Balance in Covariate Distributions Across Treatment Groups? Statistics in Medicine 28:14151424.
Rubin, D. B. 2010. “On the Limitations of Comparative Effectiveness Research.” Statistics in Medicine 29(19):19911995.
Rubin, D. B., and Stuart, E. A.. 2006. “Affinely Invariant Matching Methods with Discriminant Mixtures of Proportional Ellipsoidally Symmetric Distributions.” Annals of Statistics 34(4):18141826.
Rubin, D. B., and Thomas, N.. 2000. “Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates.” Journal of the American Statistical Association 95:573585.
Simmons, J. P., Nelson, L. D., and Simonsohn, U.. 2011. “False-Positive Psychology Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22(11):13591366.
Smith, J. A., and Todd, P. E.. 2005a. “Does Matching Overcome LaLonde’s Critique of Nonexperimental Estimators? Journal of Econometrics 125(1–2):305353.
Smith, J., and Todd, P.. 2005b. “Rejoinder.” Journal of Econometrics 125:365375.
Stuart, E. A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25(1):121.
Stuart, E. A., and Rubin, D. B.. 2007. “Best Practices in Quasi-Experimental Designs: Matching Methods for Causal Inference.” In Best Practices in Quantitative Methods , edited by Osborne, J., 155176. New York: Sage.
Stuart, E. A., and Rubin, D. B.. 2008. “Matching with Multiple Control Groups with Adjustment for Group Differences.” Journal of Educational and Behavioral Statistics 33(3):279306.
Tetlock, P. E. 2005. Expert Political Judgment: How Good Is It? How Can We Know? Princeton: Princeton University Press.
VanderWeele, T. J., and Hernan, M. A.. 2012. “Causal Inference Under Multiple Versions of Treatment.” Journal of Causal Inference 1:120.
VanderWeele, T. J., and Shpitser, I.. 2011. “A New Criterion for Confounder Selection.” Biometrics 67(4):14061413.
Vansteelandt, S., and Daniel, R.. 2014. “On Regression Adjustment for the Propensity Score.” Statistics in Medicine 33(23):40534072.
Wilson, T. D., and Brekke, N.. 1994. “Mental Contamination and Mental Correction: Unwanted Influences on Judgments and Evaluations.” Psychological Bulletin 116(1):117.
Zhao, Z. 2008. “Sensitivity of Propensity Score Methods to the Specifications.” Economic Letters 98(3):309319.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. et al. . 2014. “Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of For-Profit and Not-For-Profit High Schools in Chile.” The Annals of Applied Statistics 8(1):204231.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax

Keywords

Type Description Title
UNKNOWN
Supplementary materials

King and Nielsen supplementary material
King and Nielsen supplementary material

 Unknown (477 KB)
477 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed