Hostname: page-component-8448b6f56d-c47g7 Total loading time: 0 Render date: 2024-04-24T05:58:30.063Z Has data issue: false hasContentIssue false

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Published online by Cambridge University Press:  17 March 2020

Reagan Mozer*
Affiliation:
Bentley University, Department of Mathematical Sciences, Waltham, MA02452-4713, USA. Email: rmozer@bentley.edu
Luke Miratrix
Affiliation:
Harvard Graduate School of Education, Cambridge, MA02138, USA. Email: luke_miratrix@gse.harvard.edu
Aaron Russell Kaufman
Affiliation:
Division of Social Science, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates. Email: aaronkaufman@nyu.edu
L. Jason Anastasopoulos
Affiliation:
University of Georgia, Department of Public Administration and Policy and Political Science, Athens, GA30601, USA. Email: ljanastas@uga.edu

Abstract

Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into (1) the choice of text representation and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether, we evaluate over 100 unique text-matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher subjective match quality than current state-of-the-art techniques. We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores. This model, which we find successfully mimics human judgment, also allows for approximate and unsupervised evaluation of new procedures in our context. We then employ the identified best method to illustrate the utility of text matching in two applications. First, we engage with a substantive debate in the study of media bias by using text matching to control for topic selection when comparing news articles from thirteen news sources. We then show how conditioning on text data leads to more precise causal inferences in an observational study examining the effects of a medical intervention.

Type
Articles
Copyright
Copyright © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Contributing Editor: Jeff Gill

References

Aronson, A. R. 2001. “Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The Metamap Program.” In Proceedings of the AMIA Symposium , 1721. American Medical Informatics Association.Google Scholar
Austin, P. C. 2009. “Balance Diagnostics for Comparing the Distribution of Baseline Covariates Between Treatment Groups in Propensity-score Matched Samples.” Statistics in Medicine 28(25):30833107.CrossRefGoogle ScholarPubMed
Budak, C., Goel, S., and Rao, J. M.. 2016. “Fair and Balanced? Quantifying Media Bias Through Crowdsourced Content Analysis.” Public Opinion Quarterly 80:250271.CrossRefGoogle Scholar
Budak, C., Goel, S., and Rao, J. M.. 2019. Quantifying News Media Bias through Crowdsourcing and Machine Learning Dataset. University of Michigan - Deep Blue.Google Scholar
D’Amour, A., Ding, P., Feller, A., Lei, L., and Sekhon, J.. 2017 “Overlap in Observational Studies With High-Dimensional Covariates.” Preprint, arXiv:1711.02582.Google Scholar
Dehejia, R. H., and Wahba, S.. 2002. “Propensity Score-Matching Methods for Nonexperimental Causal Studies.” Review of Economics and Statistics 84(1):151161.CrossRefGoogle Scholar
Egami, N., Fong, C. J., Grimmer, J., Roberts, M. E., and Stewart, B. M.. 2017 “How to Make Causal Inferences Using Texts.” Preprint.Google Scholar
Enos, R. D., Hill, M., and Strange, A. M.. 2016 “Voluntary Digital Laboratories for Experimental Social Science: The Harvard Digital Lab for the Social Sciences.” Working Paper.Google Scholar
Feng, M., McSparron, J., Kien, D. T., Stone, D., Roberts, D., Schwartzstein, R., Vieillard-Baron, A., and Celi, L. A.. 2018 “When More is Not Less: A Robust Framework to Evaluate the Value of a Diagnostic Test in Critical Care.” Submitted.Google Scholar
Fogarty, C. B., Mikkelsen, M. E., Gaieski, D. F., and Small, D. S.. 2016. “Discrete Optimization for Interpretable Study Populations and Randomization Inference in an Observational Study of Severe Sepsis Mortality.” Journal of the American Statistical Association 111(514):447458.CrossRefGoogle Scholar
Gentzkow, M., and Shapiro, J. M.. 2006. “Media Bias and Reputation.” Journal of Political Economy 114(2):280316.CrossRefGoogle Scholar
Gentzkow, M., and Shapiro, J. M.. 2010. “What Drives Media Slant? Evidence From Us Daily Newspapers.” Econometrica 78(1):3571.Google Scholar
Groeling, T. 2013. “Media Bias by the Numbers: Challenges and Opportunities in the Empirical Study of Partisan News.” Annual Review of Political Science 16:129151.10.1146/annurev-polisci-040811-115123CrossRefGoogle Scholar
Groseclose, T., and Milyo, J.. 2005. “A Measure of Media Bias.” The Quarterly Journal of Economics 120(4):11911237.CrossRefGoogle Scholar
Gu, X. S., and Rosenbaum, P. R.. 1993. “Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms.” Journal of Computational and Graphical Statistics 2(4):405420.Google Scholar
Hansen, B. B., and Klopfer, S. O.. 2006. “Optimal Full Matching and Related Designs via Network Flows.” Journal of computational and Graphical Statistics 15(3):609627.CrossRefGoogle Scholar
Ho, D. E., and Quinn, K. M.. 2008. “Measuring Explicit Political Positions of Media.” Quarterly Journal of Political Science 3(4):353377.CrossRefGoogle Scholar
Holland, P. W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81(396):945960.CrossRefGoogle Scholar
Iacus, S. M., King, G., Porro, G., and Katz, J. N.. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis 20(1):124.CrossRefGoogle Scholar
Imai, K., King, G., and Stuart, E. A.. 2008. “Misunderstandings Between Experimentalists and Observationalists About Causal Inference.” Journal of the Royal Statistical Society: Series A 171(2):481502.CrossRefGoogle Scholar
Imbens, G. W., and Rubin, D. B.. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Johnson, A. E., Pollard, T. J., Shen, L., Li-wei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., and Mark, R. G.. 2016. “Mimic-Iii, a Freely Accessible Critical Care Database.” Scientific Data 3: 160035.CrossRefGoogle ScholarPubMed
Kaufman, A. R. 2020. “Measuring the Content of Presidential Policy Making: Applying Text Analysis to Executive Branch Directives.” Presidential Studies Quarterly , doi:10.1111/psq.126629.CrossRefGoogle Scholar
Kohavi, R. et al. . 1995. “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.” In IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2 , 11371143. San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
Kroeger, M. A.2016. “Plagiarizing Policy: Model Legislation in State Legislatures.” Princeton Typescript.Google Scholar
Le, Q., and Mikolov, T.. 2014. “Distributed Representations of Sentences and Documents.” In International Conference on Machine Learning , edited by Xing, E. P. and Jebara, T., 11881196.Google Scholar
MacLean, D. L., and Heer, J.. 2013. “Identifying Medical Terms in Patient-Authored Text: A Crowdsourcing-Based Approach.” Journal of the American Medical Informatics Association 20(6):11201127.CrossRefGoogle ScholarPubMed
Mason, W., and Suri, S.. 2012. “Conducting Behavioral Research on Amazon’s Mechanical Turk.” Behavior Research Methods 44(1):123.CrossRefGoogle ScholarPubMed
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems , edited by Burges, C. J. C., Bottou, L., Ghahramani, Z., and Weinberger, K. Q., 31113119. Red Hook, NY: Curran Associates.Google Scholar
Mozer, R.2019a. “Replication Data for: Matching With Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality.” https://doi.org/10.7910/DVN/K8IL3V, Harvard Dataverse, V1.CrossRefGoogle Scholar
Mozer, R.2019b. textmatch: Tools for matching text and measuring match quality. R version v0.0.0 (Version v0.0.0). Zenodo. http://doi.org/10.5281/zenodo.2626730.CrossRefGoogle Scholar
Peterson, A., and Spirling, A.. 2018. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26(1):120128.CrossRefGoogle Scholar
Roberts, M. E., Stewart, B. M., and Airoldi, E. M.. 2016. “A Model of Text for Experimentation in the Social Sciences.” Journal of the American Statistical Association 111(515):9881003.CrossRefGoogle Scholar
Roberts, M. E., Stewart, B. M., and Nielsen, R. A.. 2019. “Adjusting for Confounding with Text Matching.” Working Papers, https://scholar.princeton.edu/sites/default/files/bstewart/files/textbasedconfounding.pdf.Google Scholar
Rosenbaum, P. R. 1989. “Optimal Matching for Observational Studies.” Journal of the American Statistical Association 84(408):10241032.CrossRefGoogle Scholar
Rosenbaum, P. R. 2002. “Observational Studies.” In Observational Studies , 117. New York: Springer.CrossRefGoogle Scholar
Rosenbaum, P. R. 2010. Design of Observational Studies . New York: Springer.CrossRefGoogle ScholarPubMed
Rosenbaum, P. R., and Rubin, D. B.. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70(1):4155.CrossRefGoogle Scholar
Rosenbaum, P. R., and Rubin, D. B.. 1985. “Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score.” The American Statistician 39(1):3338.Google Scholar
Rubin, D. B. 1973a. “Matching to Remove Bias in Observational Studies.” Biometrics 29(1):159183.CrossRefGoogle Scholar
Rubin, D. B. 1973b. “The Use of Matched Sampling and Regression Adjustment to Remove Bias in Observational Studies.” Biometrics 29(1):185203.CrossRefGoogle Scholar
Rubin, D. B. 1978. “Bias Reduction Using Mahalanobis Metric Matching.” ETS Research Report Series 1978(2):110.Google Scholar
Rubin, D. B. 2006. Matched Sampling for Causal Effects . Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Salton, G. 1991. “Developments in Automatic Text Retrieval.” Science 253(5023):974980.CrossRefGoogle ScholarPubMed
Salton, G., and McGill, M. J.. 1986. Introduction to Modern Information Retrieval . New York: McGraw-Hill, Inc.Google Scholar
Sarndal, C.-E., Swensson, B., and Wretman, J.. 2003. Model Assisted Survey Sampling . New York: Springer.Google Scholar
Silber, J. H., Rosenbaum, P. R., Ross, R. N., Ludwig, J. M., Wang, W., Niknam, B. A., Mukherjee, N., Saynisch, P. A., Even-Shoshan, O., and Kelz, R. R.. 2014. “Template Matching for Auditing Hospital Cost and Quality.” Health Services Research 49(5):14461474.CrossRefGoogle ScholarPubMed
Smith, H. L. 1997. “Matching With Multiple Controls to Estimate Treatment Effects in Observational Studies.” Sociological Methodology 27(1):325353.CrossRefGoogle Scholar
Snow, R., O’Connor, B., Jurafsky, D., and Ng, A. Y.. 2008. “Cheap and Fast—but Is It Good?: Evaluating Non-Expert Annotations for Natural Language Tasks.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing , edited by Lapata, M. and Tou Ng, H., 254263. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Steiner, D. F., MacDonald, R., Liu, Y., Truszkowski, P., Hipp, J. D., Gammage, C., Thng, F., Peng, L., and Stumpe, M. C.. 2018. “Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer.” The American Journal of Surgical Pathology 42(12):16361646.CrossRefGoogle ScholarPubMed
Stuart, E. A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25(1):125.CrossRefGoogle Scholar
Taddy, M. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association 108(503):755770.CrossRefGoogle Scholar
Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267288.CrossRefGoogle Scholar
Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J., Browne, A. C., Goryachev, S., and Ngo, L.. 2007. “Term Identification Methods for Consumer Health Vocabulary Development.” Journal of Medical Internet Research 9(1):e4.CrossRefGoogle ScholarPubMed
Zubizarreta, J. R., Small, D. S., and Rosenbaum, P. R.. 2014. “Isolation in the Construction of Natural Experiments.” The Annals of Applied Statistics 8(4):20962121.CrossRefGoogle Scholar
Supplementary material: File

Mozer et al. supplementary material

Mozer et al. supplementary material

Download Mozer et al. supplementary material(File)
File 1.4 MB