Hostname: page-component-84b7d79bbc-lrf7s Total loading time: 0 Render date: 2024-07-26T09:01:52.206Z Has data issue: false hasContentIssue false

BAYESIAN ANALYSIS OF BIG DATA IN INSURANCE PREDICTIVE MODELING USING DISTRIBUTED COMPUTING

Published online by Cambridge University Press:  06 July 2017

Yanwei Zhang*
Affiliation:
Uber Technologies, 1455 Market Street, San Francisco, CA 94103, USA

Abstract

While Bayesian methods have attracted considerable interest in actuarial science, they are yet to be embraced in large-scaled insurance predictive modeling applications, due to inefficiencies of Bayesian estimation procedures. The paper presents an efficient method that parallelizes Bayesian computation using distributed computing on Apache Spark across a cluster of computers. The distributed algorithm dramatically boosts the speed of Bayesian computation and expands the scope of applicability of Bayesian methods in insurance modeling. The empirical analysis applies a Bayesian hierarchical Tweedie model to a big data of 13 million insurance claim records. The distributed algorithm achieves as much as 65 times performance gain over the non-parallel method in this application. The analysis demonstrates that Bayesian methods can be of great value to large-scaled insurance predictive modeling.

Type
Research Article
Copyright
Copyright © Astin Bulletin 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A. and Duchi, J.C. (2012) Distributed delayed stochastic optimization. In Proceedings of the IEEE 51st Annu. Conf. Decision and Control (CDC), IEEE, vol. 83, pp. 5451–5452.Google Scholar
Bermudez, L. and Karlis, D. (2011) Bayesian multivariate Poisson models for insurance ratemaking. Insurance: Mathematics and Economics, 48, 226236.Google Scholar
Browne, W.J. and Draper, D. (2006) A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1 (3), 473514.Google Scholar
Bühlmann, H. (1967) Experience rating and credibility. ASTIN Bulletin, 4 (3), 99207.Google Scholar
de Alba, E. (2002) Bayesian estimation of outstanding claims reserves. North American Actuarial Journal 6 (4), 120.CrossRefGoogle Scholar
Dunn, P.K. and Smyth, G.K. (2005) Series evaluation of Tweedie exponential dispersion models densities. Statistics and Computing, 15, 267280.CrossRefGoogle Scholar
Dunn, P.K. and Smyth, G.K. (2008) Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 7386.Google Scholar
England, P.D. and Verrall, R.J. (1999) Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Mathematics and Economics, 25, 281293.Google Scholar
Frees, E.W., Derrig, R.A. and Meyers, G. (2014) Predictive Modeling Applications in Actuarial Science, vol. 1. New York: Cambridge University Press.Google Scholar
Frees, E.W., Young, V.R. and Luo, Y. (1999) A longitudinal data analysis interpretation of credibility models. Insurance: Mathematics and Economics, 24 (3), 229247.Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003) Bayesian Data Analysis, 2nd ed. Boca Raton, FL: CRC Press.Google Scholar
Grama, A., Karypis, G., Kumar, V. and Gupta, A. (2003) Introduction to Parallel Computing, 2nd ed. Harlow: Pearson.Google Scholar
Green, P.J., Latuszynski, K., Pereyra, M. and Robert, C.P. (2015) Bayesian computation: A summary of the current state, and samples backwards and forwards. Statistics and Computing, 25, 835862.CrossRefGoogle Scholar
Haario, H., Saksman, E. and Tamminen, J. (2001) An adaptive metropolis algorithm. Bernoulli, 7, 223242.CrossRefGoogle Scholar
Jørgensen, B. and de Souza, M.C. (1994) Fitting Tweedie's compound Poisson model to insurance claims data. Scandinavian Actuarial Journal, 1, 6993.Google Scholar
Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: From Data to Decisions. Hoboken, NJ: Wiley.Google Scholar
Neal, R. (2013) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (eds. Brooks, S., Gelman, A., Jones, G. and Meng, X.L.), pp. 113162. Boca Raton, FL: Chapman and Hall.Google Scholar
Neiswanger, W., Wang, C. and Xing, E. (2014) Asymptotoically exact, embarassingly parallel MCMC. In UAI'14 Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 623632. Arlington, VA: AUAI Press.Google Scholar
Ntzoufras, I. and Dellaportas, P. (2002) Bayesian modelling of outstanding liabilities incorporating claim count uncertainty. North American Actuarial Journal, 6 (1), 113128.CrossRefGoogle Scholar
Ormandi, R., Yang, H. and Lu, Q. (2015) Scalable multidimensional hierarchical Bayesian modeling on spark. JMLR: Workshop and Conference Proceedings, 41, 3348.Google Scholar
Peters, G. W., Shevchenko, P.V. and Wüthrich, M.V. (2009) Model uncertainty in claims reserving within Tweedie's compound Poisson models. ASTIN Bulletin, 39 (1), 133.CrossRefGoogle Scholar
Reyes-Ortiz, J.L., Oneto, L. and Anguita, D. (2015) Big data analytics in the cloud: Spark on hadoop vs mpi/openmp on beowulf. Procedia Computer Science, 53, 121130.Google Scholar
Roberts, G. and Tweedie, R. (1996) Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms. Biometrika, 83, 95110.CrossRefGoogle Scholar
Rossi, P.E., McCulloch, R.E. and Allenby, G.M. (1996) The value of purchase history data in target marketing. Marketing Science, 15 (4), 321340.CrossRefGoogle Scholar
Shi, P. (2016) Insurance ratemaking using a copula-based multivariate Tweedie model. Scandinavian Actuarial Journal, 3, 198215.CrossRefGoogle Scholar
Shi, P., Sanjib, B. and Meyers, G. (2012) A Bayesian log-normal model for multivariate loss reserving. North American Actuarial Journal 16 (1), 2951.CrossRefGoogle Scholar
Smyth, G.K. and Jørgensen, B. (2002) Fitting Tweedie's compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin, 32, 143157.Google Scholar
Tweedie, M.C.K. (1984) An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (eds. Gosh, J.K. and Roy, J.), pp. 579604. Calcutta: Indian Statistical Institute.Google Scholar
Wang, X. and Dunson, D.B. (2016) Parallelizing MCMC via weierstrass sampler. Working Paper http://arxiv.org/pdf/1312.4605.Google Scholar
Welling, M. and Teh, Y. (2011) Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 681–688.Google Scholar
Wüthrich, M. (2003) Claims reserving using Tweedie's compound Poisson model. Astin Bulletin, 33, 331346.CrossRefGoogle Scholar
Wüthrich, M.V. and Merz, M. (2008) Stochastic Claims Reserving Methods in Insurance. Hoboken, NJ: Wiley.Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I. (2010) Spark: Cluster computing with working sets. In HotCloud'10 Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, p. 10. Berkeley, CA: USENIX Association.Google Scholar
Zhang, Y. (2013) Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Statistics and Computing, 23 (6), 743757.CrossRefGoogle Scholar
Zhang, Y., Dukic, V. and Guszcza, J. (2012) A Bayesian non-linear model for forecasting insurance loss payments. Journal of the Royal Statistical Society: Series A, 175 (2), 637656.CrossRefGoogle Scholar
Zhu, J., Chen, J. and Hu, W. (2014) Big learning with Bayesian methods. National Science Review nwx044. doi:10.1093/nsr/nwx044.CrossRefGoogle Scholar