Hostname: page-component-6b989bf9dc-jks4b Total loading time: 0 Render date: 2024-04-12T11:42:30.878Z Has data issue: false hasContentIssue false

Selecting the right statistical model for analysis of insect count data by using information theoretic measures

Published online by Cambridge University Press:  09 March 2007

G. Sileshi*
Affiliation:
World Agroforestry Centre (ICRAF), SADC-ICRAF Agroforestry Programme, Chitedze Agricultural Research Station, PO Box 30798, Lilongwe, Malawi
*
*P.O. Box X389, Cross Roads, Lilongwe, Malawi Fax: 00265 1707323 E-mail: sgwelde@yahoo.com

Abstract

Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1973) Information theory as an extension of the maximum likelihood principle. pp. 267281 in Petrov, B.N. & Csaki, F. (Eds) Second International Symposium on Information Theory. Akademiai Kiado, Budapest.Google Scholar
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference: a practical information-theoretic approach. 2nd edn. New York, Springer-Verlag.Google Scholar
Chatfield, C. (1995) Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A 158, 419466.CrossRefGoogle Scholar
Davis, P.M. (1994) Statistics for describing populations. pp. 3354 in Pedigo, L.P. & Buntin, G.D. (Eds)Handbook of sampling methods for arthropods in agriculture. Boca Raton, Florida, CRS Press Inc.Google Scholar
Dayton, C.M. (2003) Model comparison using information measures. Journal of Modern Applied Statistical Methods 2, 281292.CrossRefGoogle Scholar
Fletcher, D., MacKenzie, D. & Villouta, E. (2005) Modelling skewed data with many zeros: a simple approach combining ordinary and logistic regression. Environmental and Ecological Statistics 12, 4554.CrossRefGoogle Scholar
Hurvich, C.M. & Tsai, C.L. (1989) Regression and time series model selection in small samples. Biometrika 76, 297307.CrossRefGoogle Scholar
Johnson, J.B. & Omland, K.S. (2004) Model selection in ecology and evolution. Trends in Ecology and Evolution 19, 101108.CrossRefGoogle ScholarPubMed
Kennedy, P.J., Conrad, K.F., Perry, J.N., Powell, D., Aegerter, J., Todd, A.D., Walters, K.F.A. & Powell, W. (2001) Comparison of two field-scale approaches for the study of effects of insecticides on polyphagous predators in cereals. Applied Soil Ecology 17, 253266.CrossRefGoogle Scholar
Kuha, J. (2004) AIC and BIC: comparison of assumptions and performance. Sociological Methods and Research 33, 188229.CrossRefGoogle Scholar
Li, C.S., Lu, J.C., Park, J., Kim, K., Brinkley, P.A. & Peterson, J.P. (1999) Multivariate zero-inflated Poisson models and their application. Technometrics 41, 2938.CrossRefGoogle Scholar
Lindsey, J.K. (1999) On the use of corrections for overdispersion. Journal of the Royal Statistical Society: Series C 48, 553561.Google Scholar
Mackenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, J.A. & Langtimm, C.A. (2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology 83, 22482255.CrossRefGoogle Scholar
Martin, T.G., Wintel, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., Low-Choy, S.J., Tyre, A. & Possingham, H.P. (2005) Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters 8, 12351246.CrossRefGoogle ScholarPubMed
McCullagh, P. & Nelder, J.A. (1989) Generalized linear models. 2nd edn. London, Chapman and Hall.CrossRefGoogle Scholar
Mullahy, J. (1997) Heterogeneity, excess zeros, and the structure of count data models. Journal of Applied Econometrics 12, 337350.3.0.CO;2-G>CrossRefGoogle Scholar
Perry, J.N., Rothery, P., Clark, S.J., Heard, M.S. & Hawes, C. (2003) Design, analysis and statistical power of the farm-scale evaluation of genetically modified herbicide-tolerant crops. Journal of Applied Ecology 40, 1731.CrossRefGoogle Scholar
SAS Institute (2003) SAS/STAT, Release 9.1, Cary, North Carolina, SAS Institute Inc.Google Scholar
Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461464.CrossRefGoogle Scholar
Sileshi, G. & Kenis, M. (2003) Temporal and spatial distribution of Ootheca bennigseni Weise (Coleoptera: Chrysomelidae), a defoliator of food legumes and Sesbania sesban in southern Africa. pp. 6869 in Stals, R. (Ed.)Proceedings of the 14th Entomological Congress, 6–9 july 2003, Pretoria, Entomological Society of Southern Africa.Google Scholar
Sileshi, G. & Mafongoya, P.L. (2003) Effect of rotational fallows on abundance of soil insects and weeds in maize crops in eastern Zambia. Applied Soil Ecology 23, 211222.CrossRefGoogle Scholar
Sileshi, G. & Mafongoya, P.L. (2006a) Variation in macrofaunal communities under contrasting land use systems in eastern Zambia. Applied Soil Ecology 33, 4960.CrossRefGoogle Scholar
Sileshi, G. & Mafongoya, P.L. (2006b) The short-term impact of forest fire on soil invertebrates in the miombo. Biodiversity and Conservation (in press).CrossRefGoogle Scholar
Sileshi, G., Kenis, M., Ogol, C.K.P.O. & Sithanantham, S. (2001) Predators of Mesoplatys ochroptera Stål in sebania-planted fallows in eastern Zambia. BioControl 46, 289310.CrossRefGoogle Scholar
Sileshi, G., Baumgaertner, J., Sithanantham, S. & Ogol, C.K.P.O. (2002) Spatial distribution and sampling plans for Mesoplatys ochroptera Stål (Coleoptera: Chrysomelidae) on sesbania. Journal of Economic Entomology 95, 499506.CrossRefGoogle Scholar
Sileshi, G., Girma, H. & Mafongoya, P.L. (2006) Occupancy-abundance models for predicting densities of three leaf beetles damaging the multipurpose tree Sesbania sesban in eastern and southern Africa. Bulletin of Entomological Research 96, 6169.CrossRefGoogle ScholarPubMed
Stephens, P.A., Buskirk, S.W., Hayward, G.D. & Del Rio, C.M. (2005) Information theory and hypothesis testing: a call for pluralism. Journal of Applied Ecology 42, 412.CrossRefGoogle Scholar
Taylor, L.R. (1961) Aggregation, variance and the mean. Nature 189, 732735.CrossRefGoogle Scholar
Warton, D.I. (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16, 275289.CrossRefGoogle Scholar
Wasserman, L. (2000) Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 92107.CrossRefGoogle ScholarPubMed
Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. (1996) Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling 88, 297308.CrossRefGoogle Scholar
White, G.C. & Bennetts, R.E. (1996) Analysis of frequency count data using the negative binomial distribution. Ecology 77, 25492557.CrossRefGoogle Scholar
Wilson, L.T., Room, P.M. & Bourne, A.S. (1983) Dispersion of arthropods, flower buds and fruit in cotton fields: effects of population density and season on the fit of probability distributions. Journal of the Australian Entomological Society 22, 129134.CrossRefGoogle Scholar
Zucchini, W. (2000) An introduction to model selection. Journal of Mathematical Psychology 44, 4161.CrossRefGoogle ScholarPubMed