Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Van Hanh Nguyen; Catherine Matias

doi:10.1051/ps/2013041

Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Published online by Cambridge University Press: 15 October 2014

Van Hanh Nguyen and

Catherine Matias

Show author details

Van Hanh Nguyen: Affiliation:
Laboratoire de Mathématiques d’Orsay, Université Paris Sud, UMR CNRS 8628, Bâtiment 425, 91405 Orsay cedex, France. nvanhanh@genopole.cnrs.fr Laboratoire Statistique et Génome, Université d’Évry Val d’Essonne, UMR CNRS 8071, USC INRA, 23 bvd de France, 91037 Évry, France; catherine.matias@genopole.cnrs.fr
Catherine Matias: Affiliation:
Laboratoire Statistique et Génome, Université d’Évry Val d’Essonne, UMR CNRS 8071, USC INRA, 23 bvd de France, 91037 Évry, France; catherine.matias@genopole.cnrs.fr

Article contents

Abstract
References

Get access

Abstract

In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.

Keywords

False discovery rate kernel estimation local false discovery rate maximum smoothed likelihood multiple testing p-values semiparametric mixture model

Type: Research Article
Information: ESAIM: Probability and Statistics , Volume 18 , 2014 , pp. 584 - 612

DOI: https://doi.org/10.1051/ps/2013041 [Opens in a new window]
Copyright: © EDP Sciences, SMAI, 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allison, D.B., Gadbury, G.L., Heo, M., Fernández, J.R., Lee, C.-K., Prolla, T.A. and Weindruch, R., A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal. 39 (2002) 1–20. Google Scholar

Aubert, J., Bar-Hen, A., Daudin, J.-J. and Robin, S., Determination of the differentially expressed genes in microarray experiments using local fdr. BMC Bioinformatics 5 (2004) 125. Google Scholar PubMed

Benjamini, Y. and Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300. Google Scholar

Celisse, A., and Robin, S., A cross-validation based estimation of the proportion of true null hypotheses. J. Statist. Plann. Inference 140 (2010) 3132–3147. Google Scholar

Dempster, A.P., Laird, N.M. and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38. Google Scholar

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V., Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 (2001) 1151–1160. Google Scholar

Eggermont, P. and LaRiccia, V., Maximum smoothed likelihood density estimation for inverse problems. Ann. Statist. 23 (1995) 199–220. Google Scholar

P. Eggermont and V. LaRiccia, Maximum penalized likelihood estimation. Vol. 1: Density estimation. Springer Ser. Statist. Springer, New York (2001).

Eggermont, P.P.B., Nonlinear smoothing and the EM algorithm for positive integral equations of the first kind. Appl. Math. Optim. 39 (1999) 75–91. Google Scholar

Guedj, M., Robin, S., Celisse, A. and Nuel, G., Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10 (2009) 84. Google Scholar PubMed

Langaas, M., Lindqvist, B.H. and Ferkingstad, E., Estimating the proportion of true null hypotheses, with application to DNA microarray data. J.R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 555–572. Google Scholar

Levine, M., Hunter, D.R. and Chauveau, D., Maximum smoothed likelihood for multivariate mixtures. Biometrika 98 (2011) 403–416. Google Scholar

Liao, J., Lin, Y., Selvanayagam, Z.E. and Shih, W.J., A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 (2004) 2694–2701. Google Scholar PubMed

McLachlan, G., Bean, R. and Jones, L.B.-T., A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22 (2006) 1608–1615. Google Scholar PubMed

P. Neuvial, Intrinsic bounds and false discovery rate control in multiple testing problems. Technical report (2010). arXiv:1003.0747.

V. Nguyen and C. Matias, On efficient estimators of the proportion of true null hypotheses in a multiple testing setup. Technical report (2012). Preprint arXiv:1205.4097.

Pounds, S. and Morris, S.W., Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19 (2003) 1236–1242. Google Scholar PubMed

Robin, S., Bar-Hen, A., Daudin, J.-J. and Pierre, L., A semi-parametric approach for mixture models: application to local false discovery rate estimation. Comput. Statist. Data Anal. 51 (2007) 5483–5493. Google Scholar

Schweder, T., and Spjøtvoll, E., Plots of p-values to evaluate many tests simultaneously. Biometrika 69 (1982) 493–502. Google Scholar

B.W. Silverman, Density estimation for statistics and data analysis. Monogr. Statist. Appl. Prob. Chapman & Hall, London (1986).

Storey, J.D., A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (2002) 479–498. Google Scholar

Storey, J.D., The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31 (2003) 2013–2035. Google Scholar

Strimmer, K., A unified approach to false discovery rate estimation. BMC Bioinformatics 9 (2008) 303. Google Scholar PubMed

Sun, W. and Cai, T., Oracle and adaptive compound decision rules for false discovery rate control. J. Am. Stat. Assoc. 102 (2007) 901–912. Google Scholar

Sun, W. and Cai, T., Large-scale multiple testing under dependence. J. Royal Stat. Soc. Series B (Statistical Methodology) 71 (2009) 393–424. Google Scholar

A.B. Tsybakov, Introduction to nonparametric estimation. Springer Ser. Statist. Springer, New York (2009).

Wied, D. and Weißbach, R., Consistency of the kernel density estimator: a survey. Stat. Papers 53 (2012) 1–21. Google Scholar

Article contents

Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests