Bayesian and Likelihood Inference for 2 × 2 Ecological Tables: An Incomplete-Data Approach

Kosuke Imai; Ying Lu; Aaron Strauss

doi:10.1093/pan/mpm017

Bayesian and Likelihood Inference for 2 × 2 Ecological Tables: An Incomplete-Data Approach

Published online by Cambridge University Press: 13 August 2007

Kosuke Imai ,

Ying Lu and

Aaron Strauss

Show author details

Kosuke Imai*: Affiliation:
Department of Politics, Princeton University, Princeton, NJ 08544
Ying Lu: Affiliation:
Department of Sociology, University of Colorado at Boulder, Boulder, CO 80309, e-mail: ying.lu@colorado.edu
Aaron Strauss: Affiliation:
Department of Politics, Princeton University, Princeton, NJ 08544, e-mail: abstraus@princeton.edu
*: e-mail: kimai@princeton.edu (corresponding author)

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 × 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior, which relaxes common parametric assumptions. We also identify the statistical adjustments necessary to account for contextual effects. Finally, although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easy-to-use R interface, is publicly available for implementing our proposed methods (Imai, Lu, and Strauss, forthcoming).

Type: Research Article
Information: Political Analysis , Volume 16 , Issue 1 , Winter 2008 , pp. 41 - 69

DOI: https://doi.org/10.1093/pan/mpm017 [Opens in a new window]
Copyright: Copyright © The Author 2007. Published by Oxford University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors' note: This article is in the part based on two working papers by Imai and Lu, “Parametric and Nonparamateric Bayesian Models for Ecological Inference in 2 × 2 Tables” and “Quantifying Missing Information in Ecological Inference.” Various versions of these papers were presented at the 2004 Joint Statistical Meetings, the Second Cape Cod Monte Carlo Workshop, the 2004 Annual Political Methodology Summer Meeting, and the 2005 Annual Meeting of the American Political Science Association. We thank anonymous referees, Larry Bartels, Wendy Tam Cho, Jianqing Fan, Gary King, Xiao-Li Meng, Kevin Quinn, Phil Shively, David van Dyk, Jon Wakefield, and seminar participants at New York University (the Northeast Political Methodology conference), at Princeton University (Economics Department and Office of Population Research), and at the University of Virginia (Statistics Department) for helpful comments.

References

Achen, C. H., and Shively, W. P. 1995. Cross-level inference. Chicago, IL: University of Chicago Press.Google Scholar

Antoniak, C. E. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2: 1152–74.CrossRef Google Scholar

Benoit, Kenneth and King, Gary. 2003. EzI: A(n easy) program for ecological inference. Cambridge, Mass.: Harvard University. Available from: http://gking.harvard.edu. (accessed August 8, 2007).Google Scholar

Brown, P. J., and Payne, C. D. 1986. Aggregate data, ecological regression, and voting transitions. Journal of the American Statistical Association 81: 452–60.CrossRef Google Scholar

Burden, B. C., and Kimball, D. C. 1998. A new approach to the study of ticket splitting. American Political Science Review 92: 533–44.CrossRef Google Scholar

Bush, C. A., and MacEachern, S. N. 1996. A semiparametric Bayesian model for randomized block designs. Biometrika 83: 275–85.Google Scholar

Cho, W. K. T. 1998. Iff the assumption fits …: A comment on the King ecological inference solution. Political Analysis 7: 143–63.Google Scholar

Cho, W. K. T., and Gaines, B. J. 2004. The limits of ecological inference: The case of split-ticket voting. American Journal of Political Science 48: 152–71.Google Scholar

Copas, J., and Eguchi, S. 2005. Local model uncertainty and incomplete-data bias. Journal of the Royal Statistical Society, Series B (Methodological) 67: 459–513.Google Scholar

Cross, P. J., and Manski, C. F. 2002. Regressions, short and long. Econometrica 70: 357–68.Google Scholar

Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, Methodological 39: 1–37.Google Scholar

Dey, D., Müller, P., and Sinha, D., eds. 1998. Practical nonparametric and semiparametric Bayesian statistics. New York: Springer-Verlag Inc.Google Scholar

Duncan, O. D., and Davis, B. 1953. An alternative to ecological correlation. American Sociological Review 18: 665–6.Google Scholar

Escobar, M. D., and West, M. 1995. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90: 577–88.Google Scholar

Ferguson, T. S. 1973. A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1: 209–30.Google Scholar

Freedman, D. A., Klein, S. P., Sacks, J., Smyth, C. A., and Everett, C. G. 1991. Ecological regression and voting rights (with discussion). Evaluation Review 15: 673–816.CrossRef Google Scholar

Freedman, D.A., Ostland, M., Roberts, M. R., and Klein, S. P. 1998. Review of “A Solution to the Ecological Inference Problem.” Journal of the American Statistical Association 93: 1518–22.Google Scholar

Gelman, A., Park, D. K., Ansolabehere, S., Price, P. N., and Minnite, L. C. 2001. Models, assumptions and model checking in ecological regressions. Journal of the Royal Statistical Society, Series A 164: 101–18.Google Scholar

Gill, J., and Casella, G. 2006. Markov chain Monte Carlo methods for models with nonparametric priors. Technical report, University of California, Davis.Google Scholar

Goodman, L. 1953. Ecological regressions and behavior of individuals. American Sociological Review 18: 663–6.Google Scholar

Grofman, B. 1991. Statistics without substance: A critique of Freedman et al. and Clark and Morrison. Evaluation Review 15: 746–69.Google Scholar

Heitjan, D. F., and Rubin, D. B. 1991. Ignorability and coarse data. The Annals of Statistics 19: 2244–53.Google Scholar

Herron, M. C., and Shotts, K. W. 2004. Logical inconsistency in EI-based second stage regressions. American Journal of Political Science 48: 172–83.Google Scholar

Imai, K., and King, G. 2004. Did illegal overseas absentee ballots decide the 2000 U.S. presidential election? Perspectives on Politics 2: 537–49.Google Scholar

Imai, K., Lu, Y., and Strauss, A. eco: R package for ecological inference in 2 × 2 tables. Journal of Statistical Software (forthcoming).Google Scholar

Judge, G. G., Miller, D. J., and Cho, W. K. T. 2004. An information theoretic approach to ecological estimation and inference. In Ecological inference: New methodological strategies, ed. King, G., Rosen, O., and Tanner, M., 162–87. Cambridge: Cambridge University Press.Google Scholar

King, G. 1997. A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data. Princeton, NJ: Princeton University Press.Google Scholar

King, G. 1999. Comment on “review of ‘a solution to the ecological inference problem’.” Journal of the American Statistical Association 94: 352–5.Google Scholar

King, G., Rosen, O., and Tanner, M. A. 1999. Binomial-beta hierarchical models for ecological inference. Sociological Methods & Research 28: 61–90.Google Scholar

King, G., Rosen, O., and Tanner, M. A., eds. 2004. Ecological inference: New methodological strategies. Cambridge: Cambridge University Press.Google Scholar

Kong, A., Meng, X.-L., and Nicolae, D. L. 2005. Quantifying relative incomplete information for hypothesis testing in statistical and genetic studies. Unpublished manuscript, Department of Statistics, Harvard University.Google Scholar

Larson, R., Hostetler, R. P., and Edwards, B. H. 2002. Calculus: Early transcendental functions. 3rd ed. Boston, MA: Houghton Mifflin Company.Google Scholar

Meng, X.-L., and Rubin, D. B. 1991. Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86: 899–909.Google Scholar

Meng, X.-L., and Rubin, D. B. 1993. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–78.Google Scholar

Mukhopadhyay, S., and Gelfand, A. E. 1997. Dirichlet process mixed generalized linear models. Journal of the American Statistical Association 92: 633–9.Google Scholar

Neyman, J., and Scott, E. L. 1948. Consistent estimation from partially consistent observations. Econometrica 16: 1–32.Google Scholar

Orchard, T., and Woodbury, M. A. 1972. A missing information principle: Theory and applications. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability 1: 697–715.Google Scholar

Robinson, W. S. 1950. Ecological correlations and the behavior of individuals. American Sociological Review 15: 351–7.Google Scholar

Rosen, O., Jiang, W., King, G., and Tanner, M. A. 2001. Bayesian and frequentist inference for ecological inference: The R × C case. Statistica Neerlandica 55: 134–56.Google Scholar

van Dyk, D. A., Meng, X.-L., and Rubin, D. B. 1995. Maximum likelihood estimation via the ECM algorithm: Computing the asymptotic variance. Statistica Sinica 5: 55–75.Google Scholar

Wakefield, J. 2004a. Ecological inference for 2 × 2 tables (with discussion). Journal of the Royal Statistical Society, Series A 167: 385–445.Google Scholar

Wakefield, J. 2004b. Prior and likelihood choices in the analysis of ecological data. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 13–50. Cambridge: Cambridge University Press.Google Scholar

West, M., Müller, P., and Escobar, M. D. 1994. Hierarchical priors and mixture models, with application in regression and density estimation. In Aspects of uncertainty: A tribute to D. V. Lindley, ed. Smith, A. F. M. and Freedman, P. R., 363–86. London: John Wiley & Sons.Google Scholar

Article contents

Bayesian and Likelihood Inference for 2 × 2 Ecological Tables: An Incomplete-Data Approach

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests