Hostname: page-component-7c8c6479df-94d59 Total loading time: 0 Render date: 2024-03-28T21:10:02.144Z Has data issue: false hasContentIssue false

Practical and Effective Approaches to Dealing With Clustered Data

Published online by Cambridge University Press:  19 January 2018

Abstract

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.

Type
Original Articles
Copyright
© The European Political Science Association 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

Justin Esarey is an Assistant Professor of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (justin@justinesarey.com). Andrew Menger, Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (Andrew.M.Menger@rice.edu). The authors thank Ulrich Müller, Carlisle Rainey, Jonathan Kropko, Matthew Webb, Neal Beck, Jens Hainmueller, Shuai Jin, Jens Grosser, Ernesto Reuben, our anonymous reviewers, and participants at the 2015 Annual Meeting of the Midwest Political Science Association, the 2015 Annual Meeting of the Society for Political Methodology, and the 2016 Annual Meeting of the Southern Political Science Association for helpful comments and suggestions on earlier drafts of this paper. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2017.42

References

Anderson, Theodore W. 2003. An Introduction to Multivariate Statistical Analysis 3rd ed. New York, NY: Wiley.Google Scholar
Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.Google Scholar
Arellano, Manuel. 1987. ‘Computing Robust Standard Errors for Within-Groups Estimators’. Oxford Bulletin of Economics and Statistics 49(4):431434.Google Scholar
Bafumi, Joseph, and Gelman, Andrew. 2006. ‘Fitting Multilevel Models When Predictors and Group Effects Correlate’. Available at http://goo.gl/usvQsn, accessed 21 December 2017.Google Scholar
Bakirov, Nail K., and Szekely, Gabor J.. 2006. ‘Student’s t-Test for Gaussian Scale Mixtures’. Journal of Mathematical Sciences 139(3):64976505.Google Scholar
Bates, Douglas, Maechler, Martin, Bolker, Ben, and Walker, Steven. 2014. ‘lme4: Linear Mixed Effects Models Using Eigen and S4’. R package version 1.1-7. Available at http://CRAN.R-project.org/package=lme4, accessed 21 December 2017.Google Scholar
Beck, Nathaniel, and Katz, Jonathan N.. 1995. ‘What To Do (And Not To Do) With Time-Series Cross-Section Data’. American Political Science Review 89(3):634647.Google Scholar
Beck, Nathaniel L., Katz, Jonathan N., and Mignozzetti, Umberto G.. 2014. ‘Of Nickell Bias and its Cures: Comment on Gaibulloev, Sandler, and Sul’. Political Analysis 22(2):274278.Google Scholar
Bertrand, Marianne, Duflo, Esther, and Mullainathan, Sendhil. 2004. ‘How Much Should We Trust Differences-In-Differences Estimates?’. The Quarterly Journal of Economics 119(1):249275.Google Scholar
Brambor, Thomas, Clark, William Roberts, and Golder, Matthew. 2006. ‘Understanding Interaction Models: Improving Empirical Analyses’. Political Analysis 14(1):6382.Google Scholar
Cameron, A. Colin, and Miller, Douglas L.. 2015. ‘A Practitioner’s Guide to Cluster-Robust Inference’. Journal of Human Resources 50(2):317372.Google Scholar
Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. ‘Bootstrap-Based Improvements for Inference With Clustered Errors’. Review of Economics and Statistics 90(3):414427.Google Scholar
Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press.Google Scholar
Canay, Ivan A., Romano, Joseph P., and Shaikh, Azeem M.. 2014. ‘Randomization Tests Under an Approximate Symmetry Assumption’. Working Paper (version: December 19, 2014). Available at https://goo.gl/TUEQee, accessed 29 January 2017.Google Scholar
Clark, Tom S., and Linzer, Drew A.. 2015. ‘Should I Use Fixed or Random Effects?’. Political Science Research and Methods 3(2):399408.Google Scholar
Croissant, Yves. 2015. ‘Package “mlogit”.’ CRAN. Available at http://cran.r-project.org/web/packages/mlogit/mlogit.pdf, accessed 21 December 2017.Google Scholar
Croissant, Yves, and Millo, Giovanni. 2008. ‘Panel Data Econometrics in R: The plm Package’. Journal of Statistical Software 27(2):143.Google Scholar
Donald, Stephen G., and Lang, Kevin. 2007. ‘Inference With Difference-in-Differences and Other Panel Data’. The Review of Economics and Statistics 89(2):221233.Google Scholar
Donner, Allan. 1998. ‘Some Aspects of the Design and Analysis of Cluster Randomization Trials’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(1):95113.Google Scholar
Efron, Bradley. 1979. ‘Bootstrap Methods: Another Look at the Jackknife’. Annals of Statistics 7(1):126.Google Scholar
Field, Chris A., and Welsh, Alan H.. 2007. ‘Bootstrapping Clustered Data’. Journal of the Royal Statistical Society: Series B 69(3):369390.Google Scholar
Gaibulloev, Khusrav, Sandler, Todd, and Sul, Donggyu. 2014. ‘Dynamic Panel Analysis Under Cross-Sectional Dependence’. Political Analysis 22:258273.Google Scholar
Green, Donald P., and Vavreck, Lynn. 2008. ‘Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches’. Political Analysis 16(2):138152.Google Scholar
Grosser, Jens, Reuben, Ernesto, and Tymula, Agnieszka. 2013. ‘Political Quid Pro Quo Agreements: An Experimental Study’. American Journal of Political Science 57:582597.Google Scholar
Hainmueller, Jens, Hiscox, Michael, and Sequeira, Sandra. 2015. ‘Consumer Demand for the Fair Trade Label: Evidence from a Multistore Field Experiment’. Review of Economics and Statistics 97(2):242256.Google Scholar
Hansen, Christian B. 2007. ‘Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T is Large’. Journal of Econometrics 141(2):597620.Google Scholar
Harden, Jeffrey J. 2011. ‘A Bootstrap Method for Conducting Statistical Inference With Clustered Data’. State Politics & Policy Quarterly 11(2):223246.Google Scholar
Hardin, James W., and Hilbe, Joseph M.. 2003. Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Horowitz, Joel L. 1997. ‘Bootstrap Methods in Econometrics: Theory and Numerical Performance’. In David M. Kreps and Kenneth F. Wallis (eds), Advances in Economics and Econometrics: Theory and Applications: Seventh World Congress, 189222. Cambridge, UK: Cambridge University Press.Google Scholar
Hu, Feifang, and Kalbeisch, John D.. 2000. ‘The Estimating Function Bootstrap’. Canadian Journal of Statistics 28(3):449481.Google Scholar
Ibragimov, Rustam, and Müller, Ulrich K.. 2010. ‘t-Statistic Based Correlation and Heterogeneity Robust Inference’. Journal of Business & Economic Statistics 28(4):453468.Google Scholar
Imbens, Guido W., and Kolesar, Michal. 2012. ‘Robust Standard Errors in Small Samples: Some Practical Advice’ 98(4):701–12.Google Scholar
Judge, George G., Hill, R. Carter, Griffths, William E., Lutkepohl, Helmut, and Lee, Tsoung-Chao. 1988. Introduction to the Theory and Practice of Econometrics. New York, NY: Wiley.Google Scholar
Kezdi, Gabor. 2004. ‘Robust Standard Error Estimation in Fixed-Effects Panel Models’. Hungarian Statistical Review 9:95116.Google Scholar
King, Gary, and Roberts, Margaret E.. 2014. ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About it’. Political Analysis 23:159179.Google Scholar
Klar, Neil, and Donner, Allan. 2001. ‘Current and Future Challenges in the Design and Analysis of Cluster Randomization Trials’. Statistics in Medicine 20(24):37293740.Google Scholar
Lacina, Bethany. 2014. ‘How Governments Shape the Risk of Civil Violence: India’s Federal Reorganization, 1950–56’. American Journal of Political Science 58(3):720738.Google Scholar
Liang, Kung-Yee, and Zeger, Scott L.. 1986. ‘Longitudinal Data Analysis Using Generalized Linear Models’. Biometrika 73(1):1322.Google Scholar
Liang, Kung-Yee, and Zeger, Scott L.. 1993. ‘Regression Analysis for Correlated Data’. Annual Review of Public Health 14(1):4368.Google Scholar
Liu, Regina Y. 1988. ‘Bootstrap Procedures Under Some Non-I.I.D. Models’. The Annals of Statistics 16(4):16961708.Google Scholar
Liu, Regina Y., and Singh, Kesar. 1987. ‘On a Partial Correction by the Bootstrap’. The Annals of Statistics 15(4):17131718.Google Scholar
MacKinnon, James G. 2015. ‘Wild Cluster Bootstrap Confidence Intervals’. L’Actualité économique 91(1-2):1133.Google Scholar
MacKinnon, James G., and Webb, Matthew D.. 2017. ‘Wild Bootstrap Inference for Wildly Different Cluster Sizes’. Journal of Applied Econometrics 32(2):233254.Google Scholar
Mancl, Lloyd A., and DeRouen, Timothy A.. 2001. ‘A Covariance Estimator for GEE With Improved Small-Sample Properties’. Biometrics 57(1):126134.Google Scholar
Moulton, Brent R. 1986. ‘Random Group Effects and the Precision of Regression Estimates’. Journal of Econometrics 32(3):385397.Google Scholar
Moulton, Brent R. 1990. ‘An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units’. The Review of Economics and Statistics 72(2):334338.Google Scholar
Nickell, Stephen. 1981. ‘Biases in Dynamic Models With Fixed Effects’. Econometrica 49:14171426.Google Scholar
Rogers, William. 1993. ‘Regression Standard Errors in Clustered Samples’. Stata Technical Bulletin 13:1923.Google Scholar
van der Vaart, Aad W. 1998. Asymptotic Statistics. Cambridge, UK: Cambridge University Press.Google Scholar
White, Halbert. 1980. ‘A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity’. Econometrica 48(4):817838.Google Scholar
Williams, Rick L. 2000. ‘A Note on Robust Variance Estimation for Cluster-Correlated Data’. Biometrics 56(2):645646.Google Scholar
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.Google Scholar
Wu, C. F. Jeff. 1986. ‘Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis’. The Annals of Statistics 14(4):12611295.Google Scholar
Supplementary material: Link

Esarey and Menger Dataset

Link
Supplementary material: PDF

Esarey and Menger supplementary material

Esarey and Menger supplementary material 1

Download Esarey and Menger supplementary material(PDF)
PDF 594.5 KB