Hostname: page-component-84b7d79bbc-c654p Total loading time: 0 Render date: 2024-07-26T00:33:15.224Z Has data issue: false hasContentIssue false

Model Selection and Claim Frequency for Workers' Compensation Insurance

Published online by Cambridge University Press:  09 August 2013

David Pitt
Affiliation:
Department of Actuarial Studies, Macquarie University, NSW 2109, Australia, E-mail: david.pitt@mq.edu.au

Abstract

We consider a set of workers' compensation insurance claim data where the aggregate number of losses (claims) reported to insurers are classified by year of occurrence of the event causing loss, the US state in which the loss event occurred and the occupation class of the insured workers to which the loss count relates. An exposure measure, equal to the total payroll of observed workers in each three-way classification, is also included in the dataset. Data are analysed across ten different states, 24 different occupation classes and seven separate observation years. A multiple linear regression model, with only predictors for main effects, could be estimated in 223 + 9 + 1 + 1 = 234 ways, theoretically more than 17 billion different possible models! In addition, one might expect that the number of claims recorded in each year in the same state and relating to the same occupation class, are positively correlated. Different modelling assumptions as to the nature of this correlation should also be considered. On the other hand it may reasonably be assumed that the number of losses reported from different states and from different occupation classes are independent. Our data can therefore be modelled using the statistical techniques applicable to panel data and we work with generalised estimating equations (GEE) in the paper. For model selection, Pan (2001) suggested the use of an alternative to the AIC, namely the quasi-likelihood under independence model criterion (QIC), for model comparison. This paper develops and applies a Gibbs sampling algorithm for efficiently locating, out of the more than 17 billion possible models that could be considered for the analysis, that model with the optimal (least) QIC value. The technique is illustrated using both a simulation study and using workers' compensation insurance claim data.

Type
Research Article
Copyright
Copyright © International Actuarial Association 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716723.Google Scholar
Antonio, K. and Beirlant, J. (2007) Actuarial Statistics with generalized linear mixed models, Insurance: Mathematics and Economics, 40, 5876.Google Scholar
Brockman, M.J. and Wright, T.S. (1992) Statistical Motor Rating: Making Effective Use of your Data, Journal of the Institute of Actuaries, 119, 457543.Google Scholar
Cui, J. & Qian, G. (2007) Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal Data, Communications in Statistics – Simulation and Computation, 36(5), 987996.Google Scholar
Frees, E.W., Young, V.R. and Luo, Y. (1999) Case studies using panel data models, North American Actuarial Journal, 5(4), 2442.Google Scholar
Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans. Pattn. Anal. Mach. Intel., 6, 721741.CrossRefGoogle ScholarPubMed
Haberman, S. and Renshaw, A.E. (1996) Generalized linear models and actuarial science, The Statistician, 45, 407436.Google Scholar
Hardin, J.W. and Hilbe, J.M. (2003) Generalized estimating equations. Chapman and Hall.Google Scholar
Heller, G.Z., Stasinopoulos, D.M., Rigby, R.A. and De Jong, P. (2007) Mean and dispersion modelling for policy claims costs, Scandinavian Actuarial Journal, 4, 281292.Google Scholar
Klugman, S.A. (1992) Bayesian Statistics in Actuarial Science. Kluwer.Google Scholar
Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. (2005) Applied Linear Statistical Models. 5th ed. McGraw-Hill.Google Scholar
Liang, K. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models, Biometrika, 73(1), 1322.Google Scholar
Makov, U., Smith, A.F.M. and Liu, Y.H. (1996) Bayesian methods in actuarial science, The Statistician, 45(4), 503515.Google Scholar
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd ed. Chapman and Hall.Google Scholar
Pan, W. (2001) Akaike's information criterion in generalized estimating equations, Biometrics, 57, 120125.Google Scholar
Qian, G. (1999) Computations and analysis in robust regression model selection using stochastic complexity, Computational Statistics, 14, 293314.Google Scholar
Qian, G. and Field, C. (2000) Using MCMC for Logistic Regression Model Selection Involving Large Number of Candidate Models, Monte Carlo and Quasi-Monte Carlo Methods, Fang, K.T., Hickernell, F.J. & Niederrerter, H. (eds), Springer, 460474.Google Scholar
Qian, G. and Zhao, X. (2007) On time series model selection involving many candidate ARMA models, Computational Statistics & Data Analysis, 51(1), 61806196.CrossRefGoogle Scholar
R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. Google Scholar
Scollnik, D.P.M. (1996) An Introduction to Markov Chain Monte Carlo methods and their actuarial applications, Proceedings of the Casualty Actuarial Society, LXXXIII, 114165.Google Scholar