Application of Two Unsupervised Learning Techniques to Questionable Claims: PRIDIT and Random Forest

Louise A. Francis

doi:10.1017/CBO9781139342681.008

7 - Application of Two Unsupervised Learning Techniques to Questionable Claims: PRIDIT and Random Forest

Published online by Cambridge University Press: 05 August 2016

Edited by

Glenn Meyers and

Louise A. Francis: Affiliation:
Casualty Actuarial Society (CAS)
Edward W. Frees: Affiliation:
University of Wisconsin, Madison
Glenn Meyers: Affiliation:
ISO Innovative Analytics, New Jersey
Richard A. Derrig: Affiliation:
Temple University, Philadelphia

Book contents

Get access

Summary

Chapter Preview. Predictive modeling can be divided into two major kinds of modeling, referred to as supervised and unsupervised learning, distinguished primarily by the presence or absence of dependent/target variable data in the data used for modeling. Supervised learning approaches probably account for the majority of modeling analyses. The topic of unsupervised learning was introduced in Chapter 12 of Volume I of this book. This chapter follows up with an introduction to two advanced unsupervised learning techniques PRIDIT (Principal Components of RIDITS) and Random Forest (a tree based data-mining method that is most commonly used in supervised learning applications). The methods will be applied to an automobile insurance database to model questionable claims. A couple of additional unsupervised learning methods used for visualization, including multidimensional scaling, will also be briefly introduced.

Databases used for detecting questionable claims often do not contain a questionable claims indicator as a dependent variable. Unsupervised learning methods are often used to address this limitation. A simulated database containing features observed in actual questionable claims data was developed for this research based on actual data. The methods in this chapter will be applied to this data. The database is available online at the book's website.

Introduction

An introduction to unsupervised learning techniques as applied to insurance problems is provided by Francis (2014) as part of Predictive Modeling Applications in Actuarial Science, Volume I, a text intended to introduce actuaries and insurance professionals to predictive modeling analytic techniques. As an introductory work, it focused on two classical approaches: principal components and clustering. Both are standard statistical methods that have been in use for many decades and are well known to statisticians. The classical approaches have been augmented by many other unsupervised learning methods such as neural networks, association rules and link analysis. While these are frequently cited methods for unsupervised learning, only the kohonen neural network method will be briefly discussed in this chapter. The two methods featured here, PRIDIT and Random Forest clustering are less well known and less widely used. Brockett et al. (2003) introduced the application of PRIDITs to the detection of questionable claims in insurance. Lieberthal (2008) has applied the PRIDIT method to hospital quality studies.

Type: Chapter
Information: Predictive Modeling Applications in Actuarial Science , pp. 180 - 207

DOI: https://doi.org/10.1017/CBO9781139342681.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aldender, M., and R., Bscorelashfield. Cluster Analysis. Sage, Thousand Oaks, CA, 1984. Breiman, L. Random Forests. Machine Learning, 45(1): 5–32, 2001. Also available at https://www.stat.berkeley.edu/∼breiman/randomforest2001.pdf.

Breiman, L., A., Cutler, and W., Liaw. Package randomForest. http://www.r-project.org/.

Breiman, L., J. H., Friedman, R., Olshen, and C., Stone. Classification and Regression Trees, Chapman and Hall/CRC, Boca Raton, FL, 1984.

Brockett, P. L., R. A., Derrig, L. L., Golden, A., Levine, and M., Alpert. Fraud classification using principal component analysis of RIDITs. Journal of Risk and Insurance, 69(3): 341–371, 2003.Google Scholar

Brockett, P., and A., Levine. On a characterization of RIDITS. Annals of Statistics, 5(6): 1245–1248, 1977.Google Scholar

Brockett, P., X., Xia, and R., Derrig. Use of Kohonen's self organized feature maps to uncover automobiler bodily injury claims fraud. Journal of Risk and Insurance, 65(2): 245–274, 1998.Google Scholar

Bross, I. How to use Ridit analysis. Biometrics, 14(1): 18–38, 1958.Google Scholar

Coaley, K. An Introduction to Psychological Assessment and Psychometrics, Sage, Thousand Oaks, CA, 2010.

Derrig, R. Using predictive analytics to uncover questionable claims. Presentation to the International Association of Special Investigative Units, 2013.

Derrig, R. and L., Francis. Distinguishing the forest from the trees. Variance, 2: 184–208, 2008. Available at: http://www.casact.org/research/dare/index.cfm?fa=view&abstrID=6511.Google Scholar

Derrig, R. and K., Ostaszewski. Fuzzy techniques of pattern recognition in risk and claim classification. Journal of Risk and Insurance, 62(3): 447–482, 1995.Google Scholar

Derrig, R., and H., Weisberg. A report on the AIB Study of 1993 PIP Claims, Part 1, Identification and investigation of suspicious claims. Automobile Insurers Bureau of Massachusetts, 1995.

DeVille, B. Decision Trees for Business Intelligence and Data Mining, SAS Institute, 2006.

Francis, L. Martian Chronicles: Is MARS better than neural networks. 2003 CAS Winter

Forum, 2003. http://www.casact.org/research/dare/index.cfm?fa=view&abstrID=5293.

Francis, L. Review of PRIDIT. Presentation at CAS Ratemaking Seminar, 2006.

Francis, L. Unsupervised learning applied to insurance data, Salford Data Mining Conference, 2012.

Francis, L. Unsupervised learning. In E. W., Frees, G., Meyers, and R. A., Derrig (eds.), Predictive Modeling Applications in Actuarial Science: Volume 1, Predictive Modeling Techniques, pp. 280–311. Cambridge University Press, Cambridge, 2014.

Friedman, J. Stochastic gradient boosting. http://statweb.stanford.edu/∼jhf/ftp/stobst.pdf, 1999.

Gareth, J., D., Witten, T., Hastie, and R., Tibshirini. An Introduction to Statistical Learning. Springer, New York, 2013.

Guillén, M. Regression with categorical dependent variables. In E. W., Frees, G., Meyers, and R. A., Derrig (eds.), Predictive Modeling Applications in Actuarial Science: Volume I, Predictive Modeling Techniques, pp. 65–86. Cambridge University Press, Cambridge, 2014.

Hastie, T., R., Tibshirani, and J., Friedman. The Elements of Statistical Learning. Springer, New York, 2003.

Insurance Information Institute. Information on fraud. http://www.iii.org/facts_statistics/ fraud.html.

James, G., D., Witten, T., Hastie, and R., Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, New York, 2013.

Kaufman, L., and P., Rousseeuw. Finding Groups in Data. John Wiley, New York, 1990.

Kim, J., and C. W., Mueller. Factor Analysis: Statistical Methods and Practical Issues, Sage, Thousand Oaks, CA, 1978.

Kruskal, J. and M., Wish. Multidimensional Scaling. Sage University Press, 1978.

Lieberthal, D. R. Hospital quality: A PRIDIT approach. Health Services Research, 43(3): 988–1005, 2008.Google Scholar

Maechler, M. Package “cluster,” 2012. www.r-project.org.

Shi, T., and S., Horvath. Unsupervised learning with Random Forest predictors. Journal of Computational and Graphical Statistics, 15(1): 118–138, 2006.Google Scholar

Shi, T., and S., Horvath. A Tutorial for RF Clustering, 2007. fttp://labs.genetics.ucla.edu/ horvath/RFclustering/RFclustering.htm.

Smith, L. A tutorial on principal components. http://www.sccg.sk/∼haladova/principal_ components.pdf, 2002.

Venables, W., and B., Ripley. Modern Applied Statistics with S-PLUS, Springer, New York, 1999.

Viaene, S. Learning to detect fraud from enriched insurance claims data: Context, theory and application, PhD dissertation, KatholiekeUniversiteit Leuven, 2002.

Weherns, R., and L., Buydens. Self and supervised maps in R. Journal of Statistical Software, 21(5), 2007.Google Scholar

Book contents

7 - Application of Two Unsupervised Learning Techniques to Questionable Claims: PRIDIT and Random Forest

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive