BAYESIAN ANALYSIS OF BIG DATA IN INSURANCE PREDICTIVE MODELING USING DISTRIBUTED COMPUTING

Yanwei Zhang

doi:10.1017/asb.2017.15

BAYESIAN ANALYSIS OF BIG DATA IN INSURANCE PREDICTIVE MODELING USING DISTRIBUTED COMPUTING

Published online by Cambridge University Press: 06 July 2017

Yanwei Zhang

Show author details

Yanwei Zhang*: Affiliation:
Uber Technologies, 1455 Market Street, San Francisco, CA 94103, USA
*: E-Mail: actuary_zhang@hotmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

While Bayesian methods have attracted considerable interest in actuarial science, they are yet to be embraced in large-scaled insurance predictive modeling applications, due to inefficiencies of Bayesian estimation procedures. The paper presents an efficient method that parallelizes Bayesian computation using distributed computing on Apache Spark across a cluster of computers. The distributed algorithm dramatically boosts the speed of Bayesian computation and expands the scope of applicability of Bayesian methods in insurance modeling. The empirical analysis applies a Bayesian hierarchical Tweedie model to a big data of 13 million insurance claim records. The distributed algorithm achieves as much as 65 times performance gain over the non-parallel method in this application. The analysis demonstrates that Bayesian methods can be of great value to large-scaled insurance predictive modeling.

Keywords

Bayesian big data distributed computing predictive modeling ratemaking Spark

Type: Research Article
Information: ASTIN Bulletin: The Journal of the IAA , Volume 47 , Issue 3 , September 2017 , pp. 943 - 961

DOI: https://doi.org/10.1017/asb.2017.15 [Opens in a new window]
Copyright: Copyright © Astin Bulletin 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A. and Duchi, J.C. (2012) Distributed delayed stochastic optimization. In Proceedings of the IEEE 51st Annu. Conf. Decision and Control (CDC), IEEE, vol. 83, pp. 5451–5452.Google Scholar

Bermudez, L. and Karlis, D. (2011) Bayesian multivariate Poisson models for insurance ratemaking. Insurance: Mathematics and Economics, 48, 226–236.Google Scholar

Browne, W.J. and Draper, D. (2006) A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1 (3), 473–514.Google Scholar

Bühlmann, H. (1967) Experience rating and credibility. ASTIN Bulletin, 4 (3), 99–207.Google Scholar

de Alba, E. (2002) Bayesian estimation of outstanding claims reserves. North American Actuarial Journal 6 (4), 1–20.CrossRef Google Scholar

Dunn, P.K. and Smyth, G.K. (2005) Series evaluation of Tweedie exponential dispersion models densities. Statistics and Computing, 15, 267–280.CrossRef Google Scholar

Dunn, P.K. and Smyth, G.K. (2008) Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86.Google Scholar

England, P.D. and Verrall, R.J. (1999) Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Mathematics and Economics, 25, 281–293.Google Scholar

Frees, E.W., Derrig, R.A. and Meyers, G. (2014) Predictive Modeling Applications in Actuarial Science, vol. 1. New York: Cambridge University Press.Google Scholar

Frees, E.W., Young, V.R. and Luo, Y. (1999) A longitudinal data analysis interpretation of credibility models. Insurance: Mathematics and Economics, 24 (3), 229–247.Google Scholar

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2003) Bayesian Data Analysis, 2nd ed. Boca Raton, FL: CRC Press.Google Scholar

Grama, A., Karypis, G., Kumar, V. and Gupta, A. (2003) Introduction to Parallel Computing, 2nd ed. Harlow: Pearson.Google Scholar

Green, P.J., Latuszynski, K., Pereyra, M. and Robert, C.P. (2015) Bayesian computation: A summary of the current state, and samples backwards and forwards. Statistics and Computing, 25, 835–862.CrossRef Google Scholar

Haario, H., Saksman, E. and Tamminen, J. (2001) An adaptive metropolis algorithm. Bernoulli, 7, 223–242.CrossRef Google Scholar

Jørgensen, B. and de Souza, M.C. (1994) Fitting Tweedie's compound Poisson model to insurance claims data. Scandinavian Actuarial Journal, 1, 69–93.Google Scholar

Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: From Data to Decisions. Hoboken, NJ: Wiley.Google Scholar

Neal, R. (2013) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (eds. Brooks, S., Gelman, A., Jones, G. and Meng, X.L.), pp. 113–162. Boca Raton, FL: Chapman and Hall.Google Scholar

Neiswanger, W., Wang, C. and Xing, E. (2014) Asymptotoically exact, embarassingly parallel MCMC. In UAI'14 Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 623–632. Arlington, VA: AUAI Press.Google Scholar

Ntzoufras, I. and Dellaportas, P. (2002) Bayesian modelling of outstanding liabilities incorporating claim count uncertainty. North American Actuarial Journal, 6 (1), 113–128.CrossRef Google Scholar

Ormandi, R., Yang, H. and Lu, Q. (2015) Scalable multidimensional hierarchical Bayesian modeling on spark. JMLR: Workshop and Conference Proceedings, 41, 33–48.Google Scholar

Peters, G. W., Shevchenko, P.V. and Wüthrich, M.V. (2009) Model uncertainty in claims reserving within Tweedie's compound Poisson models. ASTIN Bulletin, 39 (1), 1–33.CrossRef Google Scholar

Reyes-Ortiz, J.L., Oneto, L. and Anguita, D. (2015) Big data analytics in the cloud: Spark on hadoop vs mpi/openmp on beowulf. Procedia Computer Science, 53, 121–130.Google Scholar

Roberts, G. and Tweedie, R. (1996) Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms. Biometrika, 83, 95–110.CrossRef Google Scholar

Rossi, P.E., McCulloch, R.E. and Allenby, G.M. (1996) The value of purchase history data in target marketing. Marketing Science, 15 (4), 321–340.CrossRef Google Scholar

Shi, P. (2016) Insurance ratemaking using a copula-based multivariate Tweedie model. Scandinavian Actuarial Journal, 3, 198–215.CrossRef Google Scholar

Shi, P., Sanjib, B. and Meyers, G. (2012) A Bayesian log-normal model for multivariate loss reserving. North American Actuarial Journal 16 (1), 29–51.CrossRef Google Scholar

Smyth, G.K. and Jørgensen, B. (2002) Fitting Tweedie's compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin, 32, 143–157.Google Scholar

Tweedie, M.C.K. (1984) An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (eds. Gosh, J.K. and Roy, J.), pp. 579–604. Calcutta: Indian Statistical Institute.Google Scholar

Wang, X. and Dunson, D.B. (2016) Parallelizing MCMC via weierstrass sampler. Working Paper http://arxiv.org/pdf/1312.4605.Google Scholar

Welling, M. and Teh, Y. (2011) Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 681–688.Google Scholar

Wüthrich, M. (2003) Claims reserving using Tweedie's compound Poisson model. Astin Bulletin, 33, 331–346.CrossRef Google Scholar

Wüthrich, M.V. and Merz, M. (2008) Stochastic Claims Reserving Methods in Insurance. Hoboken, NJ: Wiley.Google Scholar

Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S. and Stoica, I. (2010) Spark: Cluster computing with working sets. In HotCloud'10 Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, p. 10. Berkeley, CA: USENIX Association.Google Scholar

Zhang, Y. (2013) Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Statistics and Computing, 23 (6), 743–757.CrossRef Google Scholar

Zhang, Y., Dukic, V. and Guszcza, J. (2012) A Bayesian non-linear model for forecasting insurance loss payments. Journal of the Royal Statistical Society: Series A, 175 (2), 637–656.CrossRef Google Scholar

Zhu, J., Chen, J. and Hu, W. (2014) Big learning with Bayesian methods. National Science Review nwx044. doi:10.1093/nsr/nwx044.CrossRef Google Scholar

Article contents

BAYESIAN ANALYSIS OF BIG DATA IN INSURANCE PREDICTIVE MODELING USING DISTRIBUTED COMPUTING

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests