Hostname: page-component-76fb5796d-r6qrq Total loading time: 0 Render date: 2024-04-26T17:40:19.438Z Has data issue: false hasContentIssue false

Listwise Deletion in High Dimensions

Published online by Cambridge University Press:  02 March 2022

J. Sophia Wang
Affiliation:
Graduate Student, Department of Political Science, Yale University, New Haven, CT, USA. E-mail: jinghong.wang@yale.edu
P. M. Aronow*
Affiliation:
Associate Professor, Departments of Political Science, Biostatistics, and Statistics and Data Science, Yale University, New Haven, CT, USA. E-mail: peter.aronow@yale.edu
*
Corresponding author P. M. Aronow

Abstract

We consider the properties of listwise deletion when both n and the number of variables grow large. We show that when (i) all data have some idiosyncratic missingness and (ii) the number of variables grows superlogarithmically in n, then, for large n, listwise deletion will drop all rows with probability 1. Using two canonical datasets from the study of comparative politics and international relations, we provide numerical illustration that these problems may emerge in real-world settings. These results suggest that, in practice, using listwise deletion may mean using few of the variables available to the researcher.

Type
Letter
Copyright
© The Author(s) 2022. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Jeff Gill

References

Allison, P. D. 2001. Missing Data, Quantitative Applications in Social Sciences, Vol. 136. Thousand Oaks: Sage.Google Scholar
Arel-Bundock, V., and Pelc, K. J.. 2018. “When Can Multiple Imputation Improve Regression Estimates?Political Analysis 26 (2): 240245.CrossRefGoogle Scholar
Berk, R. 1983. “Applications of the General Linear Model to Survey Data.” In Handbook of Survey Research, edited by Peter, A. B. A., Rossi, H., and Wright, J. D., pp. 495546. Quantitative Studies in Social Relations. New York: Academic Press.CrossRefGoogle Scholar
Cameron, A., and Trivedi, P.. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.CrossRefGoogle Scholar
Esty, D. C., et al. 1999. “State Failure Task Force Report: Phase II Findings.” Environmental Change and Security Project Report 5: 4972.Google Scholar
Esty, D. C., Goldstone, J., Gurr, T. R., Surko, P., and Unger, A.. 1995. Working Papers: State Failure Task Force Report. McLean: Science Applications International Corporation.Google Scholar
Friedman, J., Hastie, T., and Tibshirani, R.. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 122.CrossRefGoogle ScholarPubMed
Honaker, J., and King, G.. 2010. “What to Do About Missing Values in Time-Series Cross-Section Data.” American Journal of Political Science 54 (2): 561581.CrossRefGoogle Scholar
King, G., Honaker, J., Joseph, A., and Scheve, K.. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review 95: 4969.CrossRefGoogle Scholar
King, G., and Zeng, L.. 2001. “Improving Forecasts of State Failure.” World Politics 53 (4): 623658.CrossRefGoogle Scholar
King, G., and Zeng, L.. 2007. “Replication Data for: Improving Forecasts of State Failure.” Harvard Dataverse.Google Scholar
Lai, T. L., Robbins, H., and Wei, C. Z.. 1978. “Strong Consistency of Least Squares Estimates in Multiple Regression.” Proceedings of the National Academy of Sciences of the United States of America 75 (7): 30343036.CrossRefGoogle ScholarPubMed
Lall, R. 2016. “How Multiple Imputation Makes a Difference.” Political Analysis 24 (4): 414433.CrossRefGoogle Scholar
Lehmann, E. 1999. Elements of Large-Sample Theory, Springer Texts in Statistics. New York: Springer.10.1007/b98855CrossRefGoogle Scholar
Little, R. J., and Rubin, D. B.. 2019. Statistical Analysis with Missing Data, Wiley Series in Probability and Statistics, Vol. 793. Hoboken: Wiley.Google Scholar
Liu, Y., Wang, Y., Feng, Y., and Wall, M. M.. 2016. “ Variable Selection and Prediction with Incomplete High-Dimensional Data .” The Annals of Applied Statistics 10 (1): 418450.CrossRefGoogle ScholarPubMed
Pepinsky, T. B. 2018. “A Note on Listwise Deletion Versus Multiple Imputation.” Political Analysis 26 (4): 480488.CrossRefGoogle Scholar
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC.CrossRefGoogle Scholar
Stata.com. 2020. “Regress—Linear Regression.” https://www.stata.com/manuals13/rregress.pdf.Google Scholar
Teorell, J., Sundström, A., Holmberg, S., Rothstein, B., Pachon, N. A., and Dalli, C. M., 2021. “The Quality of Government Standard Dataset, Version Jan21.” University of Gothenburg, The Quality of Government Institute.CrossRefGoogle Scholar
Wang, J. S., and Aronow, P. M.. 2021. “Replication Data for: Listwise Deletion in High Dimensions.” Harvard Dataverse, Draft Version, UNF:6:0gB5c9RyKb6AH1zMEUNOpQ==[fileUNF].” https://doi.org/10.7910/DVN/T8BG2K.CrossRefGoogle Scholar
Supplementary material: PDF

Wang and Aronow supplementary material

Wang and Aronow supplementary material

Download Wang and Aronow supplementary material(PDF)
PDF 646.6 KB