Hostname: page-component-5d59c44645-jb2ch Total loading time: 0 Render date: 2024-02-22T19:08:57.205Z Has data issue: false hasContentIssue false

Estimating Spatial Preferences from Votes and Text

Published online by Cambridge University Press:  03 May 2018

In Song Kim
Assistant Professor, Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. Email: insong@mit.EDU, URL:
John Londregan
Professor of Politics and International Affairs, Woodrow Wilson School, Princeton University, Princeton, NJ 08544, USA. Email:, URL:∼jbl/
Marc Ratkovic*
Assistant Professor, Department of Politics, Princeton University, Princeton, NJ 08544, USA. Email:, URL:


We introduce a model that extends the standard vote choice model to encompass text. In our model, votes and speech are generated from a common set of underlying preference parameters. We estimate the parameters with a sparse Gaussian copula factor model that estimates the number of latent dimensions, is robust to outliers, and accounts for zero inflation in the data. To illustrate its workings, we apply our estimator to roll call votes and floor speech from recent sessions of the US Senate. We uncover two stable dimensions: one ideological and the other reflecting to Senators’ leadership roles. We then show how the method can leverage common speech in order to impute missing data, recovering reliable preference estimates for rank-and-file Senators given only leadership votes.

Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Authors’ note: We thank Jong Hee Park, Alex Tahk, Brandon Stewart, Arthur Spirling, Ben Johnson, Tolya Levin, Michael Peress, Kosuke Imai, and seminar audiences at Princeton University, the Universidad de Desarollo, and the annual meeting of the Society for Political Methodology for comments on this and an earlier draft. Replication data available through the Harvard Dataverse doi:10.7910/DVN/AGUVBE.

Contributing Editor: Jonathan N. Katz


Albert, James H., and Chib, Siddhartha. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88:669679.Google Scholar
Aldrich, John, Montgomery, Jacob, and Sparks, David. 2014. Polarization and ideology: partisan sources of low dimensionality in scaled roll call analyses. Political Analysis 22:435456.Google Scholar
Barbera, Pablo. 2015. Birds of the same feather tweet together. Bayesian ideal point estimation using twitter data. Political Analysis 23(1):7691.Google Scholar
Blei, David M., Ng, Andrew Y., and Jordan, Michael I.. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3:9931022.Google Scholar
Bonica, Adam. 2014. Mapping the ideological marketplace. American Journal of Political Science 58(2):367386.Google Scholar
Chib, Siddhartha, and Winkelmann, Rainer. 2001. Markov chain Monte Carlo analysis of correlated count data. Journal of Business and Economic Statistics 19(4):428435.Google Scholar
Clinton, Joshua, and Meirowitz, Adam. 2003. Integrating voting theory and roll call analysis: a framework. Political Analysis 11:381396.Google Scholar
Clinton, Joshua, Jackman, Simon, and Rivers, Doughlas. 2004. The statistical analysis of roll call data. American Political Science Review 98:355370.Google Scholar
Elff, Martin. 2013. A dynamic state-space model of coded political texts. Political Analysis 21:217232.Google Scholar
Gentzkow, Matthew, and Shapiro, Jesse M.. 2010. What drives media slant? Evidence from U.S. daily newspapers. Econometrica 78(1):3571.Google Scholar
Gerrish, Sean, and Blei, David. 2011. Predicting legislative roll calls from text. In Proceedings of the 28th international conference on machine learning, ed. L. Getoor and T. Scheffer, pp. 489–496.Google Scholar
Gerrish, Sean, and Blei, David M.. 2012. How they vote: issue-adjusted models of legislative behavior. In Advances in neural information processing systems 25 , ed. Pereira, F., Burges, C. J. C., Bottou, L., and Weinberger, K. Q.. Red Hook, NY: Curran Associates, Inc., pp. 27532761.Google Scholar
Greene, William H. 2000. Econometric analysis . Upper Saddle River, NJ: Prentice Hall.Google Scholar
Grimmer, Justin. 2010. A Bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Political Analysis 18(1):135.Google Scholar
Grimmer, Justin, and Stewart, Brandon. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3):267297.Google Scholar
Hahn, P. Richard, Carvalho, Carlos M., and Scott, James G.. 2012. A sparse factor analytic probit model for congressional voting patterns. Journal of the Royal Statistical Society, Series A 61(4):619635.Google Scholar
Heckman, James J., and Snyder, James M. Jr. 1997. Linear probability models of the demand for attributes with an empirical application to estimating the preferences of legislators. The RAND Journal of Economics 28:S142S189.Google Scholar
Hill, Kim Quaile, and Hurley, Patricia A.. 2002. Symbolic speeches in the U.S. senate and their representational implications. The Journal of Politics 64(1):219231.Google Scholar
Ho, Daniel, and Quinn, Kevin. 2008. Measuring explicit political positions of media. Quarterly Journal of Political Science 3:353377.Google Scholar
Hoff, Peter D. 2007. Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics 1(1):265283.Google Scholar
Hopkins, Daniel, and King, Gary. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1):229247.Google Scholar
Jackman, Simon. 2009. Bayesian analysis for the social sciences . Chichester, UK: Wiley.Google Scholar
Kellerman, Michael. 2012. Estimating ideal points in the British House of commoms using early day motions. American Journal of Political Science 56(3):757771.Google Scholar
Kim, In Song, Londregan, John, and Ratkovic, Marc. 2017. Replication data for: “Estimating spatial preferences from votes and text”. Harvard Dataverse, doi:10.7910/DVN/AGUVBE.Google Scholar
Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George. 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369412.Google Scholar
Ladha, Krishna. 1991. A spatial model of leglslative voting with perceptual error. Public Choice 68:151174.Google Scholar
Lauderdale, Benjamin, and Clark, Tom. 2014. Scaling politically meaningful dimensions using texts and votes. American Journal of Political Science 58:754771.Google Scholar
Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. Extracting policy positions from political text using words as data. American Political Science Review 97(2):311331.Google Scholar
Lo, James, Proksch, Sven-Oliver, and Slapin, Jonathan B.. 2014. Ideological clarity in multiparty competition: a new measure and test using election manifestos. British Journal of Political Science 46:591610.Google Scholar
Lowe, Will, and Benoit, Kenneth. 2011. Estimating uncertainty in quantitative text analysis. Prepared for Annual Conference of Midwest Political Science Association.Google Scholar
Maltzman, Forrest, and Sigelman, Lee. 1996. The politics of talk: unconstrained floor time in the U.S. House of representatives. The Journal of Politics 58(3):819830.Google Scholar
Mazumder, R., Hastie, T., and Tibshirani, R.. 2010. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research 11:22322287.Google Scholar
McKelvey, Richard, and Zavoina, W.. 1975. A statistical model for the analysis of ordered level dependent variables. Journal of Mathematical Sociology 4:103120.Google Scholar
Murray, Jared S., Dunson, David B., Carin, Lawrence, and Lucas, Joseph E.. 2013. Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association 108(502):656665.Google Scholar
Neal, Radford. 2011. MCMC using Hamiltonian dynamics. In CRC handbooks of modern statistical method, vol. 2 , ed. Brooks, Steve, Gelman, Andrew, Jones, Galin, and Meng, Xiao-Li. New York: Chapman and Hall, pp. 113162.Google Scholar
Neyman, Jerzy, and Scott, Elizabeth. 1948. Consistent estimates based on partially consistent observations. Econometrica 16:132.Google Scholar
Park, Trevor, and Casella, George. 2008. The Bayesian lasso. Journal of the American Statistical Association 103(482):681686.Google Scholar
Pitt, Michael, Chan, David, and Kohn, Robert. 2006. Efficient Bayesian inference for Gaussian copula regression models. Biometrics 93(3):537554.Google Scholar
Poole, Keith. 2007. Changing minds? Not in congress! Public Choice 131(3):435451.Google Scholar
Poole, Keith, and Rosenthal, Howard. 1997. Congress: a political economic history of roll call voting . New York: Oxford University Press.Google Scholar
Poole, Keith T., and Rosenthal, Howard. 1985. A spatial model for legislative roll call analysis. American Journal of Political Science 29:357384.Google Scholar
Quinn, Kevin M. 2004. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis 12(4):338353.Google Scholar
Quinn, Kevin M., Monroe, Burt L., Colaresi, Michael, Crespin, Michael H., and Radev, Dragomir R.. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1):209228.Google Scholar
Roberts, Molly, Stewart, Brandon, Tingley, Dustin, Lucas, Christopher, Leder-Luis, Jetson, Gadarian, Shana, Albertson, Bethany, and Rand, David. 2014. Structural topic models for open ended survey responses. American Journal of Political Science 58:10641082.Google Scholar
Simmons, Joseph P., Nelson, Leif D., and Simonsohn, Uri. 2011. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22(11):13591366.Google Scholar
Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. A scaling model for estimating time series party positions from texts. American Journal of Political Science 52(3):705722.Google Scholar
Spirling, Arthur. 2012. US treaty-making with American Indians. American Journal of Political Science 56(1):8497.Google Scholar
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological) 58(1):267288.Google Scholar
Tipping, Michael E., and Bishop, Christopher M.. 1999. Probabilistic principal component analysis. Journal of the Royal Statistcal Society, Series B 61(3):611622.Google Scholar
Vehtari, Aki, Gelman, Andrew, and Gabry, Jonah. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27(5):14131432.Google Scholar
Wang, Eric, Salazar, Esther, Dunson, David, and Carin, Lawrence. 2013. Spatio-temporal modeling of legislation and votes. Bayesian Analysis 8(1):233268.Google Scholar
Witten, Daniela M., Tibshirani, Robert, and Hastie, Trevor. 2009. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515534.Google Scholar
Supplementary material: Link

Kim et al. Dataset

Supplementary material: File

Kim et al. supplementary material

Kim et al. supplementary material 1

Download Kim et al. supplementary material(File)
File 337 KB