Selecting the right statistical model for analysis of insect count data by using information theoretic measures

G. Sileshi

doi:10.1079/BER2006449

Selecting the right statistical model for analysis of insect count data by using information theoretic measures

Published online by Cambridge University Press: 09 March 2007

G. Sileshi

Show author details

G. Sileshi*: Affiliation:
World Agroforestry Centre (ICRAF), SADC-ICRAF Agroforestry Programme, Chitedze Agricultural Research Station, PO Box 30798, Lilongwe, Malawi
*: *P.O. Box X389, Cross Roads, Lilongwe, Malawi Fax: 00265 1707323 E-mail: sgwelde@yahoo.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.

Keywords

information criterion model uncertainty overdispersion zero-inflation

Type: Research Article
Information: Bulletin of Entomological Research , Volume 96 , Issue 5 , October 2006 , pp. 479 - 488

DOI: https://doi.org/10.1079/BER2006449 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1973) Information theory as an extension of the maximum likelihood principle. pp. 267–281 in Petrov, B.N. & Csaki, F. (Eds) Second International Symposium on Information Theory. Akademiai Kiado, Budapest.Google Scholar

Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference: a practical information-theoretic approach. 2nd edn. New York, Springer-Verlag.Google Scholar

Chatfield, C. (1995) Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A 158, 419–466.CrossRef Google Scholar

Davis, P.M. (1994) Statistics for describing populations. pp. 33–54 in Pedigo, L.P. & Buntin, G.D. (Eds)Handbook of sampling methods for arthropods in agriculture. Boca Raton, Florida, CRS Press Inc.Google Scholar

Dayton, C.M. (2003) Model comparison using information measures. Journal of Modern Applied Statistical Methods 2, 281–292.CrossRef Google Scholar

Fletcher, D., MacKenzie, D. & Villouta, E. (2005) Modelling skewed data with many zeros: a simple approach combining ordinary and logistic regression. Environmental and Ecological Statistics 12, 45–54.CrossRef Google Scholar

Hurvich, C.M. & Tsai, C.L. (1989) Regression and time series model selection in small samples. Biometrika 76, 297–307.CrossRef Google Scholar

Johnson, J.B. & Omland, K.S. (2004) Model selection in ecology and evolution. Trends in Ecology and Evolution 19, 101–108.CrossRef Google Scholar PubMed

Kennedy, P.J., Conrad, K.F., Perry, J.N., Powell, D., Aegerter, J., Todd, A.D., Walters, K.F.A. & Powell, W. (2001) Comparison of two field-scale approaches for the study of effects of insecticides on polyphagous predators in cereals. Applied Soil Ecology 17, 253–266.CrossRef Google Scholar

Kuha, J. (2004) AIC and BIC: comparison of assumptions and performance. Sociological Methods and Research 33, 188–229.CrossRef Google Scholar

Li, C.S., Lu, J.C., Park, J., Kim, K., Brinkley, P.A. & Peterson, J.P. (1999) Multivariate zero-inflated Poisson models and their application. Technometrics 41, 29–38.CrossRef Google Scholar

Lindsey, J.K. (1999) On the use of corrections for overdispersion. Journal of the Royal Statistical Society: Series C 48, 553–561.Google Scholar

Mackenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, J.A. & Langtimm, C.A. (2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology 83, 2248–2255.CrossRef Google Scholar

Martin, T.G., Wintel, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., Low-Choy, S.J., Tyre, A. & Possingham, H.P. (2005) Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters 8, 1235–1246.CrossRef Google Scholar PubMed

McCullagh, P. & Nelder, J.A. (1989) Generalized linear models. 2nd edn. London, Chapman and Hall.CrossRef Google Scholar

Mullahy, J. (1997) Heterogeneity, excess zeros, and the structure of count data models. Journal of Applied Econometrics 12, 337–350.3.0.CO;2-G>CrossRef Google Scholar

Perry, J.N., Rothery, P., Clark, S.J., Heard, M.S. & Hawes, C. (2003) Design, analysis and statistical power of the farm-scale evaluation of genetically modified herbicide-tolerant crops. Journal of Applied Ecology 40, 17–31.CrossRef Google Scholar

SAS Institute (2003) SAS/STAT, Release 9.1, Cary, North Carolina, SAS Institute Inc.Google Scholar

Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461–464.CrossRef Google Scholar

Sileshi, G. & Kenis, M. (2003) Temporal and spatial distribution of Ootheca bennigseni Weise (Coleoptera: Chrysomelidae), a defoliator of food legumes and Sesbania sesban in southern Africa. pp. 68–69 in Stals, R. (Ed.)Proceedings of the 14th Entomological Congress, 6–9 july 2003, Pretoria, Entomological Society of Southern Africa.Google Scholar

Sileshi, G. & Mafongoya, P.L. (2003) Effect of rotational fallows on abundance of soil insects and weeds in maize crops in eastern Zambia. Applied Soil Ecology 23, 211–222.CrossRef Google Scholar

Sileshi, G. & Mafongoya, P.L. (2006a) Variation in macrofaunal communities under contrasting land use systems in eastern Zambia. Applied Soil Ecology 33, 49–60.CrossRef Google Scholar

Sileshi, G. & Mafongoya, P.L. (2006b) The short-term impact of forest fire on soil invertebrates in the miombo. Biodiversity and Conservation (in press).CrossRef Google Scholar

Sileshi, G., Kenis, M., Ogol, C.K.P.O. & Sithanantham, S. (2001) Predators of Mesoplatys ochroptera Stål in sebania-planted fallows in eastern Zambia. BioControl 46, 289–310.CrossRef Google Scholar

Sileshi, G., Baumgaertner, J., Sithanantham, S. & Ogol, C.K.P.O. (2002) Spatial distribution and sampling plans for Mesoplatys ochroptera Stål (Coleoptera: Chrysomelidae) on sesbania. Journal of Economic Entomology 95, 499–506.CrossRef Google Scholar

Sileshi, G., Girma, H. & Mafongoya, P.L. (2006) Occupancy-abundance models for predicting densities of three leaf beetles damaging the multipurpose tree Sesbania sesban in eastern and southern Africa. Bulletin of Entomological Research 96, 61–69.CrossRef Google Scholar PubMed

Stephens, P.A., Buskirk, S.W., Hayward, G.D. & Del Rio, C.M. (2005) Information theory and hypothesis testing: a call for pluralism. Journal of Applied Ecology 42, 4–12.CrossRef Google Scholar

Taylor, L.R. (1961) Aggregation, variance and the mean. Nature 189, 732–735.CrossRef Google Scholar

Warton, D.I. (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16, 275–289.CrossRef Google Scholar

Wasserman, L. (2000) Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 92–107.CrossRef Google Scholar PubMed

Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. (1996) Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling 88, 297–308.CrossRef Google Scholar

White, G.C. & Bennetts, R.E. (1996) Analysis of frequency count data using the negative binomial distribution. Ecology 77, 2549–2557.CrossRef Google Scholar

Wilson, L.T., Room, P.M. & Bourne, A.S. (1983) Dispersion of arthropods, flower buds and fruit in cotton fields: effects of population density and season on the fit of probability distributions. Journal of the Australian Entomological Society 22, 129–134.CrossRef Google Scholar

Zucchini, W. (2000) An introduction to model selection. Journal of Mathematical Psychology 44, 41–61.CrossRef Google Scholar PubMed

Article contents

Selecting the right statistical model for analysis of insect count data by using information theoretic measures

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests