Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

Yajuan Si; Jerome P. Reiter; D. Sunshine Hillygus

doi:10.1093/pan/mpu009

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

Published online by Cambridge University Press: 04 January 2017

Yajuan Si ,

Jerome P. Reiter and

D. Sunshine Hillygus

Show author details

Yajuan Si*: Affiliation:
Department of Statistics, MC 4690, Columbia University, NY, NY 10027, USA
Jerome P. Reiter: Affiliation:
Department of Statistical Science, Box 90251, Duke University, Durham, NC 27708, USA
D. Sunshine Hillygus: Affiliation:
Department of Political Science, Box 90204, Duke University, Durham, NC 27708, USA
*: e-mail: ysi@stat.columbia.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Panel studies typically suffer from attrition. Ignoring the attrition can result in biased inferences if the missing data are systematically related to outcomes of interest. Unfortunately, panel data alone cannot inform the extent of bias due to attrition. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during the later waves of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by non-ignorable attrition while reducing reliance on strong assumptions about the attrition process. We present a Bayesian approach to handle attrition in two-wave panels with one refreshment sample and many categorical survey variables. The approach includes (1) an additive non-ignorable selection model for the attrition process; and (2) a Dirichlet process mixture of multinomial distributions for the categorical survey variables. We present Markov chain Monte Carlo algorithms for sampling from the posterior distribution of model parameters and missing data. We apply the model to correct attrition bias in an analysis of data from the 2007–08 Associated Press/Yahoo News election panel study.

Type: Research Article
Information: Political Analysis , Volume 23 , Issue 1 , Winter 2015 , pp. 92 - 112

DOI: https://doi.org/10.1093/pan/mpu009 [Opens in a new window]
Copyright: Copyright © The Author 2014. Published by Oxford University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors' note: Replication materials are available in Si et al. (2014).

References

Albert, J. H., and Chib, S. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88(422): 669–79.CrossRef Google Scholar

Allison, P. 2000. Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research 28:301–9.CrossRef Google Scholar

Bartels, L. M. 1999. Panel effects in the American National Election Studies. Political Analysis 8(1): 1–20.CrossRef Google Scholar

Behr, A., Bellgardt, E., and Rendtel, U. 2005. Extent and determinants of panel attrition in the European community household panel. European Sociological Review 21(5): 489–512.CrossRef Google Scholar

Bhattacharya, D. 2008a. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144(2): 430–46.CrossRef Google Scholar

Bhattacharya, D. 2008b. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144:430–46.CrossRef Google Scholar

Brehm, J. 1993. The phantom respondents: Opinion surveys and political representation. Ann Arbor: University of Michigan Press.Google Scholar

Brown, C. H. 1990. Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46(1): 143–55.CrossRef Google Scholar PubMed

Burgette, L. F., and Reiter, J. P. 2010. Multiple imputation via sequential regression trees. American Journal of Epidemiology 172:1070–6.CrossRef Google Scholar PubMed

Callegaro, M., and DiSogra, C. 2008. Computing response metrics for online panels. Public Opinion Quarterly 72(5): 1008–32.CrossRef Google Scholar

Clinton, J. 2001. Panel bias from attrition and conditioning: A case study of the Knowledge Networks panel. In AAPOR 55th Annual Conference.Google Scholar

Cranmer, S. J., and Gill, J. 2013. We have to be discrete about this: A non-parametric imputation technique for missing categorical data. British Journal of Political Science 43(02): 425–49.CrossRef Google Scholar

Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. 2013. Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science 22:238–56.Google Scholar

Diggle, P., and Kenward, M. G. 1994. Informative dropout in longitudinal data analysis. Journal of the Royal Statistical Society Series C (Applied Statistics) 43(1): 49–93.Google Scholar

Dunson, D. B., and Xing, C. 2009. Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association 104:1042–51.CrossRef Google Scholar

Erosheva, E. A., Fienberg, S. E., and Junker, B. W. 2002. Alternative statistical models and representations for large sparse multi-dimensional contingency tables. Annales de la Faculté des Sciences de Toulouse 11(4): 485–505.Google Scholar

Frankel, L., and Hillygus, S. 2013. Looking beyond demographics: Panel attrition in the ANES and GSS. Political Analysis 22(1): 1–18.Google Scholar

Frick, J. R., Goebel, J., Schechtman, E., Wagner, G. G., and Yitzhaki, S. 2006. Using analysis of Gini (ANOGI) for detecting whether two subsamples represent the same universe: The German Socio-Economic Panel Study (SOEP) experience. Sociological Methods Research 34:427–68.CrossRef Google Scholar

Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., and Meulders, M. 2005. Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61:74–85.CrossRef Google Scholar PubMed

Grimmer, J. 2010. A Bayesian hierarchical topic model for political texts: Measuring expresses agendas in Senate press releases. Political Analysis 18(1): 1–35.CrossRef Google Scholar

Hausman, J. A., and Wise, D. A. 1979. Attrition bias in experimental and panel data: The Gary income maintenance experiment. Econometrica 47(2): 455–73.CrossRef Google Scholar

He, Y., Zaslavsky, A. M., and Landrum, M. B. 2010. Multiple imputation in a large-scale complex survey: A guide. Statistical Methods in Medical Research 19:653–70.CrossRef Google Scholar

Heeringa, S. 1997. Russia longitudinal monitoring survey sample attrition, replenishment, and weighting: Rounds V-VII. University of Michigan Institute for Social Research.Google Scholar

Henderson, M., and Hillygus, D. S. 2011. The dynamics of health care opinion, 2008–2010: Partisanship, self-interest, and racial resentment. Journal of Health Politics, Policy, and Law 36(6): 945–60.CrossRef Google Scholar

Henderson, M., Hillygus, D., and Tompson, T. 2010. “Sour grapes” or rational voting? Voter decision making among thwarted primary voters in 2008. Public Opinion Quarterly 74(3): 499–529.CrossRef Google Scholar

Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 1998. Combining panel data sets with attrition and refreshment samples. Technical report 230, National Bureau of Economic Research.CrossRef Google Scholar

Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 2001. Combining panel data sets with attrition and refreshment samples. Econometrica 69:1645–59.CrossRef Google Scholar

Hogan, J. W., and Daniels, M. J. 2008. Missing data in longitudinal studies. Boca Raton, FL: Chapman and Hall.Google Scholar

Holmes, C. C., and Held, L. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis 1(1): 145–68.Google Scholar

Honaker, J., and King, G. 2010. What to do about missing values in time-series cross-section data. American Journal of Political Science 54(2): 561–81.CrossRef Google Scholar

Ishwaran, H., and James, L. F. 2001. Gibbs sampling for stick-breaking priors. Journal of the American Statistical Association 96:161–73.CrossRef Google Scholar

Iyengar, S., Sood, G., and Lelkes, Y. 2012. Affect, not ideology: A social identity perspective on polarization. Public Opinion Quarterly 76(3): 405–31.CrossRef Google Scholar

Keeter, S., Kennedy, C., Dimock, M., Best, J., and Craighill, P. 2006. Gauging the impact of growing nonresponse on estimates from a national RDD telephone survey. Public Opinion Quarterly 70(5): 759–79.CrossRef Google Scholar

Kenward, M. G. 1998. Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Statistics in Medicine 17:2723–32.3.0.CO;2-5>CrossRef Google Scholar PubMed

Kenward, M. G., Molenberghs, G., and Thijs, H. 2003. Pattern-mixture models with proper time dependence. Biometrika 90:52–71.CrossRef Google Scholar

King, G., Honaker, J., Joseph, A., and Scheve, K. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95:49–69.CrossRef Google Scholar

Kish, L., and Hess, I. 1959. A “replacement” procedure for reducing the bias of nonresponse. American Statistician 13:17–19.Google Scholar

Kropko, J., Goodrich, B., Gelman, A., and Hill, J. 2014. Multiple imputation for continuous and categorical data: Comparing joint multivariate normal and conditional approaches. Political Analysis. Published online doi:10.1093/pan/mpu007.CrossRef Google Scholar

Kruse, Y., Callegaro, M., Dennis, J., Subias, S., Lawrence, M., DiSogra, C., and Tompson, T. 2009. Panel conditioning and attrition in the AP-Yahoo! News Election Panel Study. In 64th Conference of the American Association for Public Opinion Research (AAPOR). Hollywood, FL.Google Scholar

Kyung, M., Gill, J., and Casella, G. 2011. New findings from terrorism data: Dirichlet process random effects models for latent groups. Journal of the Royal Statistical Society Series C (Applied Statistics) 60:701–21.Google Scholar

Lin, I., and Schaeffer, N. C. 1995. Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59:236–58.CrossRef Google Scholar

Little, R. J. A. 1993. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association 88:125–34.Google Scholar

Little, R. J. A., and Rubin, D. B. 2002. Statistical Analysis with Missing Data. 2nd ed. New York: John Wiley & Sons.CrossRef Google Scholar

Little, R. J. A., and Wang, Y. 1996. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52(1): 98–111.CrossRef Google Scholar PubMed

Meng, X. 1994. Posterior predictive p-values. Annals of Statistics 22:1142–60.CrossRef Google Scholar

Olsen, R. J. 2005. The problem of respondent attrition: Survey methodology is key. Monthly Labor Review 128:63–71.Google Scholar

Olson, K., and Witt, L. 2011. Are we keeping the people who used to stay? Changes in correlates of panel survey attrition over time. Social Science Research 40(4): 1037–50.CrossRef Google Scholar

Papaspiliopoulos, O. 2008. A note on posterior sampling from Dirichlet mixture models. Technical report, Centre for Research in Statistical Methodology, University of Warwick.Google Scholar

Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J. A., Payne, B. K., Akhtar, O., and Tompson, T. 2009. Determinants of turnout and candidate choice in the 2008 US presidential election illuminating the impact of racial prejudice and other considerations. Public Opinion Quarterly 73(5): 943–94.CrossRef Google Scholar

Prior, M. 2010. You've either got it or you don't? The stability of political interest over the life cycle. Journal of Politics 72:747–66.CrossRef Google Scholar

Reiter, J. P., Raghunathan, T. E., and Kinney, S. 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology 32(2): 143–50.Google Scholar

Ridder, G. 1992. An empirical evaluation of some models for non-random attrition in panel data. Structural Change and Economic Dynamics 3:337–55.CrossRef Google Scholar

Rubin, D. B. 1987. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.CrossRef Google Scholar

Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. 1999. Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association 94(448): 1096–120.Google Scholar

Schluchte, M. D. 1982. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine 11(14): 1861–70.Google Scholar

Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Statistica Sinica 4:639–50.Google Scholar

Si, Y., and Reiter, J. P. 2013. Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics 38(5): 499–521.CrossRef Google Scholar

Si, Y., Reiter, J. P., and Hillygus, D. S. 2014. Replication data for: Semi-parametric selection models for potentially nonignorable attrition in panel studies with refreshment samples. http://dx.doi.org/10.7910/DVN/25367 (accessed April 19, 2014). IQSS Dataverse Network, V1.Google Scholar

Su, Y.-S., Yajima, M., Gelman, A. E., and Hill, J. 2011. Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software 45(2): 1–31.CrossRef Google Scholar

Thompson, M., Fong, G., Hammond, D., Boudreau, C., Driezen, P., Hyland, A., Borland, R., Cummings, K., Hastings, G., Siahpush, M. 2006. Methods of the International Tobacco Control (ITC) four-country survey. Tobacco Control 15(Suppl. 3) iii12-iii18.CrossRef Google Scholar

Traugott, M. W., and Tucker, C. 1984. Strategies for predicting whether a citizen will vote and estimation of electoral outcomes. Public Opinion Quarterly 48(1): 330–43.CrossRef Google Scholar

Vehovar, V. 1999. Field substitution and unit nonresponse. Journal of Official Statistics 15:335–50.Google Scholar

Vermunt, J. K., Van Ginkel, J. R., Der Ark, V., Andries, L., and Sijtsma, K. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38(1): 369–97.CrossRef Google Scholar

Walker, S. G. 2007. Sampling the Dirichlet mixture models with slices. Computations in Statistics-Simulation and Computation 36:45–54.CrossRef Google Scholar

Wawro, G. 2002. Estimating dynamic panel data models in political science. Political Analysis 10:25–48.CrossRef Google Scholar

Wissen, L., and Meurs, H. 1989. The Dutch mobility panel: Experiences and evaluation. Transportation 16:99–119.CrossRef Google Scholar

Zabel, J. 1998. An analysis of attrition in the Panel Study of Income Dynamics and the Survey of Income and Program Participation with an application to a model of labor market behavior. Journal of Human Resources 33:479–506.CrossRef Google Scholar

Article contents

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests