A Bootstrap Method for Conducting Statistical Inference with Clustered Data

Jeffrey J. Harden

doi:10.1177/1532440011406233

A Bootstrap Method for Conducting Statistical Inference with Clustered Data

Published online by Cambridge University Press: 25 January 2021

Jeffrey J. Harden

Show author details

Jeffrey J. Harden*: Affiliation:
University of North Carolina at Chapel Hill, USA
*: Jeffrey J. Harden, University of North Carolina at Chapel Hill, Department of Political Science, 312 Hamilton Hall, CB #3265, Chapel Hill, NC 27599 Email: jjharden@unc.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

U.S. state politics researchers often analyze data with observations grouped into clusters. This structure commonly produces unmodeled correlation within clusters, leading to downward bias in the standard errors of regression coefficients. Estimating robust cluster standard errors (RCSE) is a common approach to correcting this bias. However, despite their frequent use, recent work indicates that RCSE can also be biased downward. Here the author provides evidence of that bias and offers a potential solution. Through Monte Carlo simulation of an ordinary least squares (OLS) regression model, the author compares conventional standard error (OLS-SE) and RCSE performance to that of a bootstrap method that resamples clusters of observations (BCSE). The author shows that both OLS-SE and RCSE are biased downward, with OLS-SE being the most biased. In contrast, BCSE are not biased and consistently outperform the other two methods. The author concludes with three replications from recent work and offers recommendations to researchers.

Keywords

clustered data standard errors bootstrapping

Type: Research Article
Information: State Politics & Policy Quarterly , Volume 11 , Issue 2 , June 2011 , pp. 223 - 246

DOI: https://doi.org/10.1177/1532440011406233 [Opens in a new window]
Copyright: Copyright © The Author(s) 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alecxih, Lisa, and Corea, John. 1998. “Deriving State-Level Estimates from Three National Surveys: A Statistical Assessment and State Tabulations.” http://aspe.hhs.gov/daltcp/reports/deriving.pdf (Accessed May 15, 2009).Google Scholar

Arai, Mahmood. 2009. “Cluster-Robust Standard Errors Using R.” http://people.su.se/~ma/clustering.pdf (Accessed May 29, 2009).Google Scholar

Arceneaux, Kevin. 2005. “Using Cluster-Randomized Field Experiments to Study Voting Behavior.” Annals of the American Academy of Political and Social Science 601:169–79.CrossRef Google Scholar

Arceneaux, Kevin, and Huber, Gregory. 2007. “Identifying the Persuasive Effects of Presidential Advertising.” American Journal of Political Science 51:957–77.Google Scholar

Arceneaux, Kevin, and Nickerson, David W.. 2009. “Modeling Certainty with Clustered Data: A Comparison of Methods.” Political Analysis 17:177–90.CrossRef Google Scholar

Brambor, Thomas, Clark, William Roberts, and Golder, Matt. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14:63–82.CrossRef Google Scholar

Brown, Robert D., Jackson, Robert A., and Wright, Gerald C.. 1999. “Registration, Turnout, and State Party Systems.” Political Research Quarterly 52:463–79.CrossRef Google Scholar

Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. “Bootstrap Based Improvements for Inference with Clustered Errors.” Review of Economics and Statistics 90:414–27.CrossRef Google Scholar

Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.CrossRef Google Scholar

Carsey, Thomas M., and Jackson, Robert A.. 2001. “Misreport of Vote Choice in U.S. Senate and Gubernatorial Elections.” State Politics & Policy Quarterly 1:196–209.CrossRef Google Scholar

Carsey, Thomas M., and Wright, Gerald C.. 1998. “State and National Factors in Gubernatorial and Senatorial Elections.” American Journal of Political Science 42:994–1002.CrossRef Google Scholar

Efron, Bradley, and Tibshirani, Robert J.. 1993. An Introduction to the Bootstrap. New York: Chapman & Hall.CrossRef Google Scholar

Erikson, Robert S., Pinto, Pablo M., and Rader, Kelly T.. 2010. “Randomization Tests and Multi-Level Data in State Politics.” State Politics & Policy Quarterly 10:180–98.CrossRef Google Scholar

Feng, Ziding, McLerran, Dale, and Grizzle, James. 1996. “A Comparison of Statistical Methods for Clustered Data Analysis with Gaussian Error.” Statistics in Medicine 15:1793–1806.3.0.CO;2-2>CrossRef Google Scholar PubMed

Fisher, Ronald A. 1922. “On the Interpretation of x ² from Contingency Tables, and the Calculation of p.” Journal of the Royal Statistical Society 85:87–94.CrossRef Google Scholar

Franzese, Robert J. 2005. “Empirical Strategies for Various Manifestations of Multilevel Data.” Political Analysis 13:430–46.CrossRef Google Scholar

Gelman, Andrew, and Hill, Jennifer. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.Google Scholar

Genz, Alan, Bretz, Frank, Hothorn, Torsten, Miwa, Tetsuhisa, Mi, Xuefei, Leisch, Friedrich, and Scheipl, Fabian. 2008. “mvtnorm: Multivariate Normal and t Distributions.” R package version 0.9-3.http://CRAN.R-project.org/package=mvtnorm (Accessed May 29, 2009).Google Scholar

Green, Donald P., and Vavreck, Lynn. 2008. “Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Techniques.” Political Analysis 16:138–52.CrossRef Google Scholar

Harden, Jeffrey J. 2010. “Improving Statistical Inference with Clustered Data.” University of North Carolina at Chapel Hill. Typescript.Google Scholar

Harrell, Frank E. 2008a. “Design: Design Package.” R package version 2.1-2. http://biostat.mc.vanderbilt.edu/s/Design (Accessed May 29, 2009).Google Scholar

Harrell, Frank E. 2008b. “Hmisc: Harrell Miscellaneous.” R package version 3.4-4. http://biostat.mc.vanderbilt.edu/s/Hmisc (Accessed May 29, 2009).Google Scholar

Hill, Kim Quaile, and Leighley, Jan E.. 1996. “Political Parties and Class Mobilization in Contemporary United States Elections.” American Journal of Political Science 40:787–804.CrossRef Google Scholar

Hogan, Robert E. 2008. “Policy Responsiveness and Incumbent Reelection in State Legislatures.” American Journal of Political Science 52:858–73.CrossRef Google Scholar

Huber, Peter J. 1967. “The Behavior of Maximum Likelihood Estimates under Non-standard Conditions.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 221–33.Google Scholar

Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. New York: John Wiley.CrossRef Google Scholar

Kennedy, Peter E. 1995. “Randomization Tests in Econometrics.” Journal of Business & Economic Statistics 13:85–94.Google Scholar

King, Gary, Tomz, Michael, and Wittenberg, Jason. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44:341–55.CrossRef Google Scholar

Kish, Leslie. 1965. Survey Sampling. New York: John Wiley.Google Scholar

Künsch, Hans R. 1989. “The Jackknife and the Bootstrap for General Stationary Observations.” Annals of Statistics 17:1217–41.CrossRef Google Scholar

Liang, Kung-Yee, and Zeger, Scott L.. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73:13–22.CrossRef Google Scholar

Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” Review of Economics and Statistics 72:334–38.CrossRef Google Scholar

Primo, David M., Jacobsmeier, Matthew L., and Milyo, Jeffrey. 2007. “Estimating the Impact of State Policies and Institutions with Mixed-Level Data.” State Politics & Policy Quarterly 7:446–59.CrossRef Google Scholar

Raudenbush, Stephen W., and Bryk, Anthony S.. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. Thousand Oaks, CA: Sage.Google Scholar

R Development Core Team. 2008. “R: A Language and Environment for Statistical Computing.” Vienna, Austria:R Foundation for Statistical Computing. http://www.r-project.org.Google Scholar

Rogers, William H. 1993. “Regression Standard Errors in Clustered Samples.” Stata Technical Bulletin 13:19–23.Google Scholar

StataCorp. 2007. “Stata Statistical Software: Release 10.” College Station, TX: StataCorp.Google Scholar

Tolbert, Caroline J., McNeal, Ramona S., and Smith, Daniel A.. 2003. “Enhancing Civic Engagement: The Effect of Direct Democracy on Political Participation and Knowledge.” State Politics & Policy Quarterly 3:23–41.CrossRef Google Scholar

White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica 48:817–38.CrossRef Google Scholar

Williams, Rick L. 2000. “A Note on Robust Variance Estimation for Cluster-Correlated Data.” Biometrics 56:645–46.CrossRef Google Scholar PubMed

Wolfinger, Raymond E., Highton, Benjamin, and Mullin, Megan. 2005. “How Postregistration Laws Affect the Turnout of Citizens Registered to Vote.” State Politics & Policy Quarterly 5:1–23.CrossRef Google Scholar

Zeileis, Achim. 2006. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16:1–16.CrossRef Google Scholar

Zeileis, Achim, and Hothorn, Torsten. 2002. “Diagnostic Checking in Regression Relationships.” R News 2:7–10. http://CRAN.R-project.org/doc/Rnews/May 29, 2009). 2:7–10.Google Scholar

Zorn, Christopher. 2001. “Generalized Estimating Equation Models for Correlated Data: A Review with Applications.” American Journal of Political Science 45:470–90.CrossRef Google Scholar

Zorn, Christopher. 2006. “Comparing GEE and Robust Standard Errors for Conditionally Dependent Data.” Political Research Quarterly 59:329–41.CrossRef Google Scholar

Harden supplementary material

Dataverse Files

File 168.6 KB

Article contents

A Bootstrap Method for Conducting Statistical Inference with Clustered Data

Abstract

Keywords

Access options

References

Harden supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests