How many bins should be put in a regular
    histogram

Lucien Birgé; Yves Rozenholc

doi:10.1051/ps:2006001

How many bins should be put in a regularhistogram

Published online by Cambridge University Press: 31 January 2006

Lucien Birgé and

Yves Rozenholc

Show author details

Lucien Birgé: Affiliation:
UMR 7599 “Probabilités et modèles aléatoires", Laboratoire de Probabilités, boîte 188, Université Paris VI, 4 Place Jussieu, 75252 Paris Cedex 05, France; lb@ccr.jussieu.fr;
Yves Rozenholc: Affiliation:
MAP5-UMR CNRS 8145, Université Paris 5, 45 rue des Saints-Pères, 75270 Paris Cedex 06, France; yves.rozenholc@math-info.univ.paris5.fr

Article contents

Abstract
References

Get access

Abstract

Given an n-sample from some unknown density f on [0,1], it is easy to construct an histogram of the data based on some given partition of [0,1], but not so much is known about an optimal choice of the partition, especially when the data set is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of f. Our purpose in this paper is to give an automatic, easy to program and efficient method to choose the number of bins of the partition from the data. It is based on bounds on the risk of penalized maximum likelihood estimators due to Castellan and heavy simulations which allowed us to optimize the form of the penalty function. These simulations show that the method works quite well for sample sizes as small as 25.

Keywords

Regular histograms density estimation penalized maximum likelihood model selection.

Type: Research Article
Information: ESAIM: Probability and Statistics , Volume 10 , February 2006 , pp. 24 - 45

DOI: https://doi.org/10.1051/ps:2006001 [Opens in a new window]
Copyright: © EDP Sciences, SMAI, 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H., A new look at the statistical model identification. IEEE Trans. Automatic Control 19 (1974) 716–723. CrossRef

Barron, A.R., Birgé, L. and Massart, P.. Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301–415. CrossRef

L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, D. Pollard, E. Torgersen and G. Yang, Eds., Springer-Verlag, New York (1997) 55–87.

Birgé, L. and Massart, P., Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203–268.

G. Castellan, Modified Akaike's criterion for histogram density estimation. Technical Report. Université Paris-Sud, Orsay (1999).

Castellan, G., Sélection d'histogrammes à l'aide d'un critère de type Akaike. CRAS 330 (2000) 729–732.

J. Daly, The construction of optimal histograms. Commun. Stat., Theory Methods 17 (1988) 2921–2931.

L. Devroye, A Course in Density Estimation. Birkhäuser, Boston (1987).

L. Devroye, and L. Györfi, Nonparametric Density Estimation: The L ₁ View. John Wiley, New York (1985).

L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation. Springer-Verlag, New York (2001).

Freedman, D. and Diaconis, P., On the histogram as a density estimator: L ₂ theory. Z. Wahrscheinlichkeitstheor. Verw. Geb. 57 (1981) 453–476. CrossRef

Hall, P., Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields 85 (1990) 449–467. CrossRef

Hall, P. and Hannan, E.J., On stochastic complexity and nonparametric density estimation. Biometrika 75 (1988) 705–714. CrossRef

He, K. and Meeden, G., Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference 61 (1997) 49–59. CrossRef

D.R.M. Herrick, G.P. Nason and B.W. Silverman, Some new methods for wavelet density estimation. Sankhya, Series A 63 (2001) 394–411.

Jones, M.C., On two recent papers of Y. Kanazawa. Statist. Probab. Lett. 24 (1995) 269–271. CrossRef

Kanazawa, Y., Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett. 17 (1993) 293–298. CrossRef

L.M. Le Cam, Asymptotic Methods in Statistical Decision Theory. Springer-Verlag, New York (1986).

L.M. Le Cam and G.L. Yang, Asymptotics in Statistics: Some Basic Concepts. Second Edition. Springer-Verlag, New York (2000).

Rissanen, J., Stochastic complexity and the MDL principle. Econ. Rev. 6 (1987) 85–102. CrossRef

Rudemo, M., Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 (1982) 65–78.

Scott, D.W., On optimal and databased histograms. Biometrika 66 (1979) 605–610. CrossRef

Sturges, H.A., The choice of a class interval. J. Am. Stat. Assoc. 21 (1926) 65–66. CrossRef

Taylor, C.C., Akaike's information criterion and the histogram. Biometrika. 74 (1987) 636–639. CrossRef

Terrell, G.R., The maximal smoothing principle in density estimation. J. Am. Stat. Assoc. 85 (1990) 470–477. CrossRef

Wand, M.P., Data-based choice of histogram bin width. Am. Statistician 51 (1997) 59–64.

Article contents

How many bins should be put in a regularhistogram

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests