Skip to main content Accessibility help

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

  • Yajuan Si (a1), Jerome P. Reiter (a2) and D. Sunshine Hillygus (a3)


Panel studies typically suffer from attrition. Ignoring the attrition can result in biased inferences if the missing data are systematically related to outcomes of interest. Unfortunately, panel data alone cannot inform the extent of bias due to attrition. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during the later waves of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by non-ignorable attrition while reducing reliance on strong assumptions about the attrition process. We present a Bayesian approach to handle attrition in two-wave panels with one refreshment sample and many categorical survey variables. The approach includes (1) an additive non-ignorable selection model for the attrition process; and (2) a Dirichlet process mixture of multinomial distributions for the categorical survey variables. We present Markov chain Monte Carlo algorithms for sampling from the posterior distribution of model parameters and missing data. We apply the model to correct attrition bias in an analysis of data from the 2007–08 Associated Press/Yahoo News election panel study.


Corresponding author


Hide All

Authors' note: Replication materials are available in Si et al. (2014).



Hide All
Albert, J. H., and Chib, S. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88(422): 669–79.
Allison, P. 2000. Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research 28:301–9.
Bartels, L. M. 1999. Panel effects in the American National Election Studies. Political Analysis 8(1): 120.
Behr, A., Bellgardt, E., and Rendtel, U. 2005. Extent and determinants of panel attrition in the European community household panel. European Sociological Review 21(5): 489512.
Bhattacharya, D. 2008a. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144(2): 430–46.
Bhattacharya, D. 2008b. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144:430–46.
Brehm, J. 1993. The phantom respondents: Opinion surveys and political representation. Ann Arbor: University of Michigan Press.
Brown, C. H. 1990. Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46(1): 143–55.
Burgette, L. F., and Reiter, J. P. 2010. Multiple imputation via sequential regression trees. American Journal of Epidemiology 172:1070–6.
Callegaro, M., and DiSogra, C. 2008. Computing response metrics for online panels. Public Opinion Quarterly 72(5): 1008–32.
Clinton, J. 2001. Panel bias from attrition and conditioning: A case study of the Knowledge Networks panel. In AAPOR 55th Annual Conference.
Cranmer, S. J., and Gill, J. 2013. We have to be discrete about this: A non-parametric imputation technique for missing categorical data. British Journal of Political Science 43(02): 425–49.
Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. 2013. Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science 22:238–56.
Diggle, P., and Kenward, M. G. 1994. Informative dropout in longitudinal data analysis. Journal of the Royal Statistical Society Series C (Applied Statistics) 43(1): 4993.
Dunson, D. B., and Xing, C. 2009. Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association 104:1042–51.
Erosheva, E. A., Fienberg, S. E., and Junker, B. W. 2002. Alternative statistical models and representations for large sparse multi-dimensional contingency tables. Annales de la Faculté des Sciences de Toulouse 11(4): 485505.
Frankel, L., and Hillygus, S. 2013. Looking beyond demographics: Panel attrition in the ANES and GSS. Political Analysis 22(1): 118.
Frick, J. R., Goebel, J., Schechtman, E., Wagner, G. G., and Yitzhaki, S. 2006. Using analysis of Gini (ANOGI) for detecting whether two subsamples represent the same universe: The German Socio-Economic Panel Study (SOEP) experience. Sociological Methods Research 34:427–68.
Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., and Meulders, M. 2005. Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61:7485.
Grimmer, J. 2010. A Bayesian hierarchical topic model for political texts: Measuring expresses agendas in Senate press releases. Political Analysis 18(1): 135.
Hausman, J. A., and Wise, D. A. 1979. Attrition bias in experimental and panel data: The Gary income maintenance experiment. Econometrica 47(2): 455–73.
He, Y., Zaslavsky, A. M., and Landrum, M. B. 2010. Multiple imputation in a large-scale complex survey: A guide. Statistical Methods in Medical Research 19:653–70.
Heeringa, S. 1997. Russia longitudinal monitoring survey sample attrition, replenishment, and weighting: Rounds V-VII. University of Michigan Institute for Social Research.
Henderson, M., and Hillygus, D. S. 2011. The dynamics of health care opinion, 2008–2010: Partisanship, self-interest, and racial resentment. Journal of Health Politics, Policy, and Law 36(6): 945–60.
Henderson, M., Hillygus, D., and Tompson, T. 2010. “Sour grapes” or rational voting? Voter decision making among thwarted primary voters in 2008. Public Opinion Quarterly 74(3): 499529.
Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 1998. Combining panel data sets with attrition and refreshment samples. Technical report 230, National Bureau of Economic Research.
Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 2001. Combining panel data sets with attrition and refreshment samples. Econometrica 69:1645–59.
Hogan, J. W., and Daniels, M. J. 2008. Missing data in longitudinal studies. Boca Raton, FL: Chapman and Hall.
Holmes, C. C., and Held, L. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis 1(1): 145–68.
Honaker, J., and King, G. 2010. What to do about missing values in time-series cross-section data. American Journal of Political Science 54(2): 561–81.
Ishwaran, H., and James, L. F. 2001. Gibbs sampling for stick-breaking priors. Journal of the American Statistical Association 96:161–73.
Iyengar, S., Sood, G., and Lelkes, Y. 2012. Affect, not ideology: A social identity perspective on polarization. Public Opinion Quarterly 76(3): 405–31.
Keeter, S., Kennedy, C., Dimock, M., Best, J., and Craighill, P. 2006. Gauging the impact of growing nonresponse on estimates from a national RDD telephone survey. Public Opinion Quarterly 70(5): 759–79.
Kenward, M. G. 1998. Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Statistics in Medicine 17:2723–32.
Kenward, M. G., Molenberghs, G., and Thijs, H. 2003. Pattern-mixture models with proper time dependence. Biometrika 90:5271.
King, G., Honaker, J., Joseph, A., and Scheve, K. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95:4969.
Kish, L., and Hess, I. 1959. A “replacement” procedure for reducing the bias of nonresponse. American Statistician 13:1719.
Kropko, J., Goodrich, B., Gelman, A., and Hill, J. 2014. Multiple imputation for continuous and categorical data: Comparing joint multivariate normal and conditional approaches. Political Analysis. Published online doi:10.1093/pan/mpu007.
Kruse, Y., Callegaro, M., Dennis, J., Subias, S., Lawrence, M., DiSogra, C., and Tompson, T. 2009. Panel conditioning and attrition in the AP-Yahoo! News Election Panel Study. In 64th Conference of the American Association for Public Opinion Research (AAPOR). Hollywood, FL.
Kyung, M., Gill, J., and Casella, G. 2011. New findings from terrorism data: Dirichlet process random effects models for latent groups. Journal of the Royal Statistical Society Series C (Applied Statistics) 60:701–21.
Lin, I., and Schaeffer, N. C. 1995. Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59:236–58.
Little, R. J. A. 1993. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association 88:125–34.
Little, R. J. A., and Rubin, D. B. 2002. Statistical Analysis with Missing Data. 2nd ed. New York: John Wiley & Sons.
Little, R. J. A., and Wang, Y. 1996. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52(1): 98111.
Meng, X. 1994. Posterior predictive p-values. Annals of Statistics 22:1142–60.
Olsen, R. J. 2005. The problem of respondent attrition: Survey methodology is key. Monthly Labor Review 128:6371.
Olson, K., and Witt, L. 2011. Are we keeping the people who used to stay? Changes in correlates of panel survey attrition over time. Social Science Research 40(4): 1037–50.
Papaspiliopoulos, O. 2008. A note on posterior sampling from Dirichlet mixture models. Technical report, Centre for Research in Statistical Methodology, University of Warwick.
Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J. A., Payne, B. K., Akhtar, O., and Tompson, T. 2009. Determinants of turnout and candidate choice in the 2008 US presidential election illuminating the impact of racial prejudice and other considerations. Public Opinion Quarterly 73(5): 943–94.
Prior, M. 2010. You've either got it or you don't? The stability of political interest over the life cycle. Journal of Politics 72:747–66.
Reiter, J. P., Raghunathan, T. E., and Kinney, S. 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology 32(2): 143–50.
Ridder, G. 1992. An empirical evaluation of some models for non-random attrition in panel data. Structural Change and Economic Dynamics 3:337–55.
Rubin, D. B. 1987. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.
Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. 1999. Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association 94(448): 1096–120.
Schluchte, M. D. 1982. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine 11(14): 1861–70.
Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Statistica Sinica 4:639–50.
Si, Y., and Reiter, J. P. 2013. Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics 38(5): 499521.
Si, Y., Reiter, J. P., and Hillygus, D. S. 2014. Replication data for: Semi-parametric selection models for potentially nonignorable attrition in panel studies with refreshment samples. (accessed April 19, 2014). IQSS Dataverse Network, V1.
Su, Y.-S., Yajima, M., Gelman, A. E., and Hill, J. 2011. Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software 45(2): 131.
Thompson, M., Fong, G., Hammond, D., Boudreau, C., Driezen, P., Hyland, A., Borland, R., Cummings, K., Hastings, G., Siahpush, M. 2006. Methods of the International Tobacco Control (ITC) four-country survey. Tobacco Control 15(Suppl. 3) iii12-iii18.
Traugott, M. W., and Tucker, C. 1984. Strategies for predicting whether a citizen will vote and estimation of electoral outcomes. Public Opinion Quarterly 48(1): 330–43.
Vehovar, V. 1999. Field substitution and unit nonresponse. Journal of Official Statistics 15:335–50.
Vermunt, J. K., Van Ginkel, J. R., Der Ark, V., Andries, L., and Sijtsma, K. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38(1): 369–97.
Walker, S. G. 2007. Sampling the Dirichlet mixture models with slices. Computations in Statistics-Simulation and Computation 36:4554.
Wawro, G. 2002. Estimating dynamic panel data models in political science. Political Analysis 10:2548.
Wissen, L., and Meurs, H. 1989. The Dutch mobility panel: Experiences and evaluation. Transportation 16:99119.
Zabel, J. 1998. An analysis of attrition in the Panel Study of Income Dynamics and the Survey of Income and Program Participation with an application to a model of labor market behavior. Journal of Human Resources 33:479506.
MathJax is a JavaScript display engine for mathematics. For more information see

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

  • Yajuan Si (a1), Jerome P. Reiter (a2) and D. Sunshine Hillygus (a3)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed