Skip to main content Accessibility help


  • Saraswata Chaudhuri (a1)


Cost-effective survey methods such as multi(R)-phase sampling typically generate samples that are collections of monotonic subsamples, i.e., the variables observed for the units in subsample r are also observed for the units in subsample r + 1 for r = 1,…,R – 1. These subsamples represent subpopulations that can be systematically different if the selection of a unit in each phase of sampling depends on the observed variables for that unit from past phases. Our article is about optimally combining all the subsamples for the efficient estimation of a finite dimensional parameter defined by moment restrictions on a generic target population that is an arbitrary union of these subpopulations. Only the R-th subsample is assumed to contain all the variables that are arguments of the moment function. Semiparametric efficiency bounds for estimation are obtained under a unified framework, allowing for full generality of the selection on observables in the sampling design. Contribution of each subsample toward efficient estimation is analyzed; and this turns out to differ fundamentally from that in setups where the same collection of subsamples is instead generated unplanned by unknown sampling. Uniquely, our setup enables all the subsamples to contribute to the efficient estimation for all the target populations, which we show is not possible in other setups. Efficient estimation is standard. Simulation evidence of substantive efficiency gains from using all the subsamples is provided for all the targets.


Corresponding author

*Address correspondence to Saraswata Chaudhuri, Department of Economics, McGill University, Montreal, Canada; e-mail:


Hide All

I am very much grateful to the editor P.C.B. Phillips, the co-editor P. Guggenberger and three anonymous referees for their detailed insightful comments. The article was circulated before as “A Note on Efficiency Gains from Multiple Incomplete Subsamples” but the title was modified at the suggestion of the editor. Previous versions of the article, some of which are available on the author’s webpage, benefitted from the helpful comments of A. Prokhorov, C. Muris, D. Guilkey, D. Frazier, E. Renault, F. Lange, J. Hill, J. Haushofer, J. MacKinnon, J. Wooldridge, M. Carrasco, M. Chemin, P. Saha Chaudhuri, S.J. Lee, and V. Zinde-Walsh, the seminar participants at Brown, Concordia, McGill (Econ and Biostat), Queen’s, U. Canterbury, U. Montreal, U. New South Wales, UNC Chapel Hill, U. Sydney, West Virginia University and the Midwest Econometrics Group meetings (2013).



Hide All
Abrevaya, J. & Donald, S.G. (2017) A GMM approach for dealing with missing data on regressors and instruments. Review of Economics and Statistics 99, 657662.
Ackerberg, D., Chen, X., & Hahn, J. (2012) A practical asymptotic variance estimator for two-step semiparametric estimators. The Review of Economics and Statistics 94, 481498.
Ai, C. & Chen, X. (2012) The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics 170, 442457.
Andrews, D.W.K. (1994) Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62, 4372.
Ashraf, N., Berry, J., & Shapiro, J.M. (2010) Can higher prices stimulate product use? Evidence from a field experiment in zambia. American Economic Review 100, 23832413.
Ashraf, N., Field, E., & Lee, J. (2014) Household bargaining and excess fertility: An experimental study in zambia. American Economic Review 104, 22102237.
Barnwell, J.L. & Chaudhuri, S. (2018) Efficient Estimation in Sub and Full Populations with Monotonically Missing at Random Data. Technical report, McGill University.
Beaman, L., Karlan, D., Thusbaert, B., & Udry, C. (2015) Self-Selection into Credit Markets: Evidence from Agriculture in Mali. Mimeo.
Beegle, K., Weerdt, J.D., Friedman, J., & Gibson, J. (2012) Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics 98, 318.
Brown, B. & Newey, W. (1998) Efficient semiparametric estimation of expectations. Econometrica 66, 453464.
Carroll, R., Ruppert, D., & Stefanski, L. (1995) Measurement Error in Nonlinear Models. Chapman and Hall.
Cattaneo, M. (2010) Efficient semiparametric estimation of multivalued treatment effects under ignorability. Journal of Econometrics 155, 138154.
Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal of Business and Economic Statistics 10, 2026.
Chatterjee, N. & Li, Y. (2010) Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data. Journal of the American Statistical Association 105, 787797.
Chaudhuri, S. (2014) A Note on Efficiency Gains from Multiple Incomplete Subsamples. Mimeo.
Chaudhuri, S. & Guilkey, D.K. (2016) GMM with multiple missing variables. Journal of Applied Econometrics 31, 678706.
Chaudhuri, S. & Hill, J.B. (2016) Heavy Tail Robust Estimation and Inference for Average Treatment Effect. Technical report, University of North Carolina.
Chen, X., Hong, H., & Tamer, E. (2005) Measurement error models with auxiliary data. Review of Economic Studies 72, 343366.
Chen, X., Hong, H., & Tarozzi, A. (2008) Semiparametric efficiency in GMM models with auxiliary data. Annals of Statistics 36, 808843.
Chen, X., Linton, O., & van Keilegom, I. (2003) Estimation of semiparametric models when the criteria function is not smooth. Econometrica 71, 15911608.
Dardanoni, V., Modica, S., & Peracchi, F. (2011) Regression with imputed covariates: A generalized missing-indicator approach. Journal of Econometrics 162, 362368.
Devereux, P.J. & Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 1732.
Graham, B.S. (2011) Efficiency bounds for missing data models with semiparametric restrictions. Econometrica 79, 437452.
Graham, B.S., Pinto, C., & Egel, D. (2012) Inverse probability tilting for moment condition models with missing data. Review of Economic Studies 79, 10531079.
Graham, B.S., Pinto, C.C.D.X., & Egel, D. (2016) Efficient estimation of data combination models by the method of auxiliary-to-study tilting. Journal of Business and Economic Statistics 34, 288301.
Graham, J.W., Hofer, S.M., & MacKinnon, D.P. (1996) Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research 31, 197218.
Graham, J.W., Taylor, B.J., Olchowski, A.E., & Cumsille, P.E. (2006) Planned missing data designs in psychological research. Psychological Methods 11, 323342.
Hahn, J. (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315331.
Holcroft, C., Rotnitzky, A., & Robins, J.M. (1997) Efficient estimation of regression parameters from multistage studies with validation of outcome and covariates. Journal of Statistical Planning and Inference 65, 349374.
Holt, C.A. & Laury, S.K. (2002) Risk aversion and incentive effects. The American Economic Review 92, 16441655.
Ichimura, I. & Martinez-Sanchis, E. (2005) Identification and Estimation of GMM Models by Combining Two Data Sets. Working paper.
Khan, S. & Tamer, E. (2010) Irregular identification, support conditions, and inverse weight estimation. Econometrica 78, 20212042.
Lee, A.J., Scott, A.J., & Wild, C.J. (2012) Efficient estimation in multiphase case–control studies. Biometrika 97, 361374.
Little, R. & Rubin, D. (2002) Statistical Analysis with Missing Data. Wiley.
McKenzie, D. & Rosenzweig, M. (2012) Preface for symposium on measurement and survey design. Journal of Development Economics 98, 12.
Muris, C. (2016) Efficient GMM Estimation with a General Missing Data Pattern. Technical report, Simon Frasier University.
Newey, W.K. & McFadden, D.L. (1994) Large sample estimation and hypothesis testing. In Engle, R.F. & McFadden, D. (eds.), Handbook of Econometrics, vol. IV, chapter 36. pp. 22122245. Elsevier Science Publisher.
Pakes, A. & Pollard, D. (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57, 10271057.
Raghunathan, T.E. & Grizzle, J.E. (1995) A split questionnaire survey design. Journal of the American Statistical Association 90, 5463.
Reilly, M. (1996) Optimal sampling strategies for two-stage studies. American Journal of Epidemiology 143, 92100.
Ridder, G. & Moffitt, R. (2007) The econometrics of data combination. In Heckman, J.J. & Leamer, E.E. (eds.), Handbook of Econometrics, vol. 6B, chapter 75. pp. 54705547. Elsevier Science Publisher.
Robins, J. & Rotnitzky, A. (1995) Semiparametric efficiency in multivariate regression models with missing data. Journal of American Statistical Association 90, 122129.
Robins, M., Rotnitzky, A., & Zhao, L. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of American Statistical Association 427, 846866.
Robins, M., Rotnitzky, A., & Zhao, L. (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of American Statistical Association 429, 106121.
Rotnitzky, A. & Robins, J. (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82, 805820.
Rubin, D. (1976) Inference and missing data. Biometrika 63, 581592.
Shoemaker, D.M. (1973) Principles and Procedures of Multiple Matrix Sampling. Ballinger.
Thornton, R.L. (2008) The demand for, and impact of, learning HIV status. American Economic Review 98, 18291863.
Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 1732.
Tripathi, G. (2011) Moment-based inference with stratified data. Econometric Theory 27, 4773.
Tsiatis, A.A. (2006) Semiparametric Theory and Missing Data. Springer.
Wacholder, S., Carroll, R.J., Pee, D., & Gail, M.H. (1994) The partial questionnaire design for case-control studies. Statistics in Medicine 13, 623634.
Whittemore, A.S. (1997) Multistage sampling designs and estimating equations. Journal of Royal Statistical Society, Series B 59, 589602.
Wooldridge, J. (1999) Asymptotic properties of weighted M-estimators for variable probability samples. Econometrica 69, 13851406.
Wooldridge, J. (2007) Inverse probability weighted estimation for general missing data problems. Journal of Econometrics 141(2), 12811301.
Type Description Title
Supplementary materials

Chaudhuri supplementary material
Online supplement

 PDF (253 KB)
253 KB


  • Saraswata Chaudhuri (a1)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed