Scaling Data from Multiple Sources

Ted Enamorado; Gabriel López-Moctezuma; Marc Ratkovic

doi:10.1017/pan.2020.24

Scaling Data from Multiple Sources

Published online by Cambridge University Press: 23 November 2020

Ted Enamorado

Gabriel López-Moctezuma and

Marc Ratkovic

Show author details

Ted Enamorado*: Affiliation:
Assistant Professor, Department of Political Science, Washington University in St. Louis, St. Louis, MO63130, USA. Email: ted@wustl.edu, URL: http://www.tedenamorado.com
Gabriel López-Moctezuma: Affiliation:
Assistant Professor, Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA91125, USA. Email: glmoctezuma@caltech.edu, URL: http://glmoctezuma.com
Marc Ratkovic: Affiliation:
Assistant Professor, Department of Politics, Princeton University, Princeton, NJ08544, USA. Email: ratkovic@princeton.edu, URL: http://www.princeton.edu/~ratkovic
*: Corresponding author Ted Enamorado

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We introduce a method for scaling two datasets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives, while recovering the words most associated with each senator’s location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.

Keywords

multidimensional scaling principal component analysis U.S. Senate

Type: Article
Information: Political Analysis , Volume 29 , Issue 2 , April 2021 , pp. 212 - 235

DOI: https://doi.org/10.1017/pan.2020.24 [Opens in a new window]
Copyright: © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Betsy Sinclair

References

Albert, J. H., and Chib, S.. (1993). “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88(422):669–679.CrossRef Google Scholar

Aldrich, J., and McKelvey, R.. (1977). “A Method of Scaling with Applications to the 1968 and 1972 Presidential Elections.” American Political Science Review 71(1):111–130.CrossRef Google Scholar

Anderson, T. (1989). “Linear Latent Variable Models and Covariance Structures.” Journal of Econometrics 41: 91–119.CrossRef Google Scholar

Bach, F., and Jordan, M. (2005). “A Probabilistic Interpretation of Canonical Correlation Analysis.” Technical Report 688, Department of Statistics, University of California at Berkeley.Google Scholar

Bafumi, J., and Herron, M. (2010). “Leapfrog Representation and Extremism: A Study of American Voters and Their Members in Congress.” American Political Science Review 104(3):519–542.Google Scholar

Barbera, P. (2016). “Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis 23(1):76–91.Google Scholar

Bonica, A. (2014). “Mapping the Ideological Marketplace.” American Journal of Political Science 58(2):367–386.Google Scholar

Borg, I., Groenen, P. J.F., and Mair, P. (2013). Applied Multidimensional Scaling. New York: Springer.CrossRef Google Scholar

Borg, I., and Groenen, P. J. (2005). Modern Multidimensional Scaling: Theory and Applications. New York: Springer.Google Scholar

Browne, M. W. (1979). “The Maximum-Likelihood Solution in Inter-Battery Factor Analysis.” British Journal of Mathematical and Statistical Psychology 32:75–86.CrossRef Google Scholar

Clinton, J., Jackman, S., and Rivers, D. (2004). “The Statistical Analysis of Roll Call Data.” American Political Science Review 98(2):355–370.CrossRef Google Scholar

Coppedge, M. et al. (2015). “V-dem Codebook v5.” Varieties of Democracy (V- Dem) Project.Google Scholar

Denny, M. J., and Spirling, A. (2018). “Text Preprocessing for Unsupervised Learning: Why it Matters, When It Misleads, and What to do About it.” Political Analysis 26(2):168–189.CrossRef Google Scholar

Enamorado, T., López-Moctezuma, G., and Ratkovic, M. (2020a). “Replication Data for: Scaling Data from Multiple Sources.” https://doi.org/10.24433/CO.3824807.v1, Code Ocean, V1.CrossRef Google Scholar

Enamorado, T., López-Moctezuma, G., and Ratkovic, M. (2020b). “Replication Data for: Scaling Data from Multiple Sources.” https://doi.org/10.7910/DVN/FOUVEL, Harvard Dataverse, V1.CrossRef Google Scholar

Gentzkow, M., and Shapiro, J. M. (2010). “What Drives Media Slant? Evidence from US Daily Newspapers.” Econometrica 78(1):35–71.Google Scholar

Goplerud, M. (2019). “A Multinomial Framework for Ideal Point Estimation.” Political Analysis 27(1):69–89.CrossRef Google Scholar

Groseclose, T., and Milyo, J. (2005). “A Measure of Media Bias.” The Quarterly Journal of Economics 120(4):1191–1237.CrossRef Google Scholar

Gupta, S. K., Phung, D., Adams, B., and Venkatesh, S. (2011). “A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources.” In Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science, 136–147. New York: Springer-Verlag.Google Scholar

Hahn, P. R., Carvalho, C. M., and Scott, J. G. (2012). “A Sparse Factor Analytic Probit Model for Congressional Voting Patterns.” Journal of the Royal Statistical Society, Series A 61(4):619–635.CrossRef Google Scholar

Hansen, S., McMahon, M., and Prat, A. (2018). “Transparency and Deliberation within the FOMC: A Computational Linguistics Approach.” Quarterly Journal of Economics 133(2):801–870.CrossRef Google Scholar

Hare, C., Armstrong, D. A. II, Carroll, R. B. R., and Poole, K. T. (2015). “Using Bayesian Aldrich-Mckelvey Scaling to Study Citizens’ Ideological Preferences and Perceptions.” American Journal of Political Science 59(3):759–774.CrossRef Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J.. (2013). The Elements of Statistical Learning. (10 edn). New York: Springer-Verlag.Google Scholar

Hobbs, W. (2017). Pivoted Text Scaling for Open-Ended Survey Responses. Unpublished manuscript.CrossRef Google Scholar

Hobbs, W. R., and Roberts, M. E. (2018). “How Sudden Censorship Can Increase Access to Information.” American Political Science Review 112(3):621–636.CrossRef Google Scholar

Hoff, P. D. (2007). “Extending the Rank Likelihood for Semiparametric Copula Estimation.” The Annals of Applied Statistics 1(1):265–283.Google Scholar

Jackman, S., and Trier, S. (2008). “Democracy as a Latent Variable.” American Journal of Political Science 52(1):201–17.Google Scholar

Jacoby, W. G. (1986). “Levels of Conceptualization and Reliance on the Liberal-Conservative Continuum.” The Journal of Politics 48(2):423–432.CrossRef Google Scholar

Jacoby, W. G. (2009). “Public Opinion During a Presidential Campaign: Distinguishing the Effects of Environmental Evolution and Attitude Change.” Electoral Studies 28(3):422–436.Google Scholar

Jacoby, W. G., and Armstrong, D. A. II (2014). “Bootstrap Confidence Regions for Multidimensional Scaling Solutions.” American Journal of Political Science 58(1):264–278.CrossRef Google Scholar

Jessee, S. (2016). “(How) Can We Estimate the Ideology of Citizens and Political Elites on the Same Scale?” American Journal of Political Science 60(4):1108–1124.CrossRef Google Scholar

Keele, L., McConnaughy, C., and White, I. (2012). “Strengthening the Experimenter’s Toolbox: Statistical Estimation of Internal Validity.” American Journal of Political Science 56(2):484–499.Google Scholar

Kellerman, M. (2012). “Estimating Ideal Points in the British House of Commoms Using Early Day Motions.” American Journal of Political Science 56(3):757–771.Google Scholar

Kim, I. S., Londregan, J., and Ratkovic, M. (2018). “Estimating Spatial Preferences from Votes and Text.” Political Analysis 26(2):210–229.CrossRef Google Scholar

Klami, A., Virtanen, S., and Kaski, S. (2013). “Bayesian Canonical Correlation Analysis.” Journal of Machine Learning Research 14(Apr):965–1003.Google Scholar

Ladha, K. (1991). “A Spatial Model of Leglslative Voting with Perceptual Error.” Public Choice 68(1/3):151–74.CrossRef Google Scholar

Lauderdale, B., and Clark, T. (2014). “Scaling politically meaningful dimensions using texts and votes.” American Journal of Political Science 58(3):754–771.Google Scholar

Lewis, J. B., and Tausanovitch, C. (2015). “When Does Joint Scaling Allow for Direct Comparisons of Preferences?” Technical Report, University of California, Los Angeles.Google Scholar

Mair, P., Borg, I., and Rusch, T. (2016). “Goodness-of-Fit Assessment in Multidimensional Scaling and Unfolding.” Multivariate Behavioral Research 51(6):772–789.Google Scholar

Martin, G. J., and Yurukoglu, A. (2017). “Bias in Cable News: Persuasion and polarization.” American Economic Review 107(9):2565–99.CrossRef Google Scholar

Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.Google Scholar

Murray, J. S., Dunson, D. B., Carin, L., and Lucas, J. E. (2013). “Bayesian Gaussian Copula Factor Models for Mixed Data.” Journal of the American Statistical Association 108(502):656–665.CrossRef Google Scholar PubMed

Poole, K., and Rosenthal, H. (1997). Congress: A Political Economic History of Roll Call Voting. New York: Oxford University Press.Google Scholar

Poole, K. T. (2005). Spatial Models of Parliamentary Voting. Analytical Methods for Social Research. Cambridge: Cambridge University Press.CrossRef Google Scholar

Quinn, K. M. (2004). “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses.” Political Analysis 12(4):338–353.CrossRef Google Scholar

Roberts, M., et al. (2014). “Structural Topic Models for Open Ended Survey Responses.” American Journal of Political Science 58(4):1064–1082.CrossRef Google Scholar

Rockova, V., and George, E. I. (2016). “Fast Bayesian Factor Analysis Via Automatic Rotations to Sparsity.” Journal of the American Statistical Association 111(516):1608–1622.Google Scholar

Shor, B., and McCarty, N. (2011). “The Ideological Mapping of American Legislatures.” American Political Science Review 105(3):530–551.CrossRef Google Scholar

Stewart, C. I., and Woon, J. (1998). Congressional Committee Assignments, 103rd to 114th Congresses, 1993–2017: Senate, 11/17/2017.Google Scholar

Tausanovitch, C., and Warshaw, C. (2013). “Measuring Constituent Policy Preferences in Congress, State Legislatures, and Cities.” The Journal of Politics 75(2):330–342.CrossRef Google Scholar

Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic Principal Component Analysis.” Journal of the Royal Statistcal Society, Series B 61(3):611–622.Google Scholar

Tucker, L. R. (1958). “An Inter-Battery Method of Factor Analysis.” Psychometrika 23(2):111–136.CrossRef Google Scholar

Enamorado et al. Dataset

Dataset

https://doi.org/10.7910/DVN/FOUVEL

Link

Enamorado et al. Supplementary Materials

PDF 502.1 KB

Article contents

Scaling Data from Multiple Sources

Abstract

Keywords

Access options

Footnotes

References

Enamorado et al. Dataset

Enamorado et al. Supplementary Materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests