Model Selection and Claim Frequency for Workers' Compensation Insurance

Jisheng Cui; David Pitt; Guoqi Qian

doi:10.2143/AST.40.2.2061136

Model Selection and Claim Frequency for Workers' Compensation Insurance

Published online by Cambridge University Press: 09 August 2013

Jisheng Cui ,

David Pitt and

Guoqi Qian

Show author details

David Pitt: Affiliation:
Department of Actuarial Studies, Macquarie University, NSW 2109, Australia, E-mail: david.pitt@mq.edu.au

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We consider a set of workers' compensation insurance claim data where the aggregate number of losses (claims) reported to insurers are classified by year of occurrence of the event causing loss, the US state in which the loss event occurred and the occupation class of the insured workers to which the loss count relates. An exposure measure, equal to the total payroll of observed workers in each three-way classification, is also included in the dataset. Data are analysed across ten different states, 24 different occupation classes and seven separate observation years. A multiple linear regression model, with only predictors for main effects, could be estimated in 223 + 9 + 1 + 1 = 234 ways, theoretically more than 17 billion different possible models! In addition, one might expect that the number of claims recorded in each year in the same state and relating to the same occupation class, are positively correlated. Different modelling assumptions as to the nature of this correlation should also be considered. On the other hand it may reasonably be assumed that the number of losses reported from different states and from different occupation classes are independent. Our data can therefore be modelled using the statistical techniques applicable to panel data and we work with generalised estimating equations (GEE) in the paper. For model selection, Pan (2001) suggested the use of an alternative to the AIC, namely the quasi-likelihood under independence model criterion (QIC), for model comparison. This paper develops and applies a Gibbs sampling algorithm for efficiently locating, out of the more than 17 billion possible models that could be considered for the analysis, that model with the optimal (least) QIC value. The technique is illustrated using both a simulation study and using workers' compensation insurance claim data.

Keywords

Model selection QIC Longitudinal study Workers' compensation insurance Gibbs sampler

Type: Research Article
Information: ASTIN Bulletin: The Journal of the IAA , Volume 40 , Issue 2 , November 2010 , pp. 779 - 796

DOI: https://doi.org/10.2143/AST.40.2.2061136 [Opens in a new window]
Copyright: Copyright © International Actuarial Association 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716–723.Google Scholar

Antonio, K. and Beirlant, J. (2007) Actuarial Statistics with generalized linear mixed models, Insurance: Mathematics and Economics, 40, 58–76.Google Scholar

Brockman, M.J. and Wright, T.S. (1992) Statistical Motor Rating: Making Effective Use of your Data, Journal of the Institute of Actuaries, 119, 457–543.Google Scholar

Cui, J. & Qian, G. (2007) Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal Data, Communications in Statistics – Simulation and Computation, 36(5), 987–996.Google Scholar

Frees, E.W., Young, V.R. and Luo, Y. (1999) Case studies using panel data models, North American Actuarial Journal, 5(4), 24–42.Google Scholar

Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans. Pattn. Anal. Mach. Intel., 6, 721–741.CrossRef Google Scholar PubMed

Haberman, S. and Renshaw, A.E. (1996) Generalized linear models and actuarial science, The Statistician, 45, 407–436.Google Scholar

Hardin, J.W. and Hilbe, J.M. (2003) Generalized estimating equations. Chapman and Hall.Google Scholar

Heller, G.Z., Stasinopoulos, D.M., Rigby, R.A. and De Jong, P. (2007) Mean and dispersion modelling for policy claims costs, Scandinavian Actuarial Journal, 4, 281–292.Google Scholar

Klugman, S.A. (1992) Bayesian Statistics in Actuarial Science. Kluwer.Google Scholar

Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. (2005) Applied Linear Statistical Models. 5th ed. McGraw-Hill.Google Scholar

Liang, K. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models, Biometrika, 73(1), 13–22.Google Scholar

Makov, U., Smith, A.F.M. and Liu, Y.H. (1996) Bayesian methods in actuarial science, The Statistician, 45(4), 503–515.Google Scholar

McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd ed. Chapman and Hall.Google Scholar

Pan, W. (2001) Akaike's information criterion in generalized estimating equations, Biometrics, 57, 120–125.Google Scholar

Qian, G. (1999) Computations and analysis in robust regression model selection using stochastic complexity, Computational Statistics, 14, 293–314.Google Scholar

Qian, G. and Field, C. (2000) Using MCMC for Logistic Regression Model Selection Involving Large Number of Candidate Models, Monte Carlo and Quasi-Monte Carlo Methods, Fang, K.T., Hickernell, F.J. & Niederrerter, H. (eds), Springer, 460–474.Google Scholar

Qian, G. and Zhao, X. (2007) On time series model selection involving many candidate ARMA models, Computational Statistics & Data Analysis, 51(1), 6180–6196.CrossRef Google Scholar

R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. Google Scholar

Scollnik, D.P.M. (1996) An Introduction to Markov Chain Monte Carlo methods and their actuarial applications, Proceedings of the Casualty Actuarial Society, LXXXIII, 114–165.Google Scholar

Article contents

Model Selection and Claim Frequency for Workers' Compensation Insurance

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests