Skip to main content Accessibility help
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 2
  • Print publication year: 2014
  • Online publication date: August 2014

3 - Regression with Categorical Dependent Variables

from II - Predictive Modeling Foundations


Chapter Preview. This chapter presents regression models where the dependent variable is categorical, whereas covariates can either be categorical or continuous. In the first part binary dependent variable models are presented, and the second part is aimed at covering general categorical dependent variable models, where the dependent variable has more than two outcomes. This chapter is illustrated with datasets, inspired by real-life situations. It also provides the corresponding R programs for estimation, which are based on R packages glm and mlogit. The same output can be obtained when using SAS or similar software programs for estimating the models presented in this chapter.

Coding Categorical Variables

Categorical variables measure qualitative traits; in other words, they evaluate concepts that can be expressed in words. Table 3.1 presents examples of variables that are measured in a categorical scale and are often found in insurance companies databases. These variables are also called risk factors when they denote characteristics that are associated with losses.

Categorical variables must have mutually exclusive outcomes. The number of categories is the number of possible response levels. For example, if we focus on insurance policies, we can have a variable such as TYPE OF POLICY CHOSEN with as many categories as the number of possible choices for the contracts offered to the customer.

Artis, M., Ayuso, M., and Guillén, M. (1999). Modelling Different Types of Automobile Insurance Fraud Behaviour in the Spanish Market, 24(1), 67–81.
Cameron, A. C. and P. K., Trivedi (2005). Microeconometrics: Methods and Applications. Cambridge University Press, New York.
Frees, E. W. (2010). Regression Modeling with Actuarial and Financial Applications. Cambridge University Press, New York.
Greene, W. H. (2011). Econometric Analysis (7th ed.). Prentice Hall, New York.
Hilbe, J. M. (2009). Logistic Regression Models. CRC Press, Chapman & Hall. Boca Raton, FL.
Hosmer, D. W. and S., Lemeshow (2000). Applied Logistic Regression (2nd ed.). John Wiley & Sons, New York.
Long, J. S. (1997). Regression Models of Categorical and Limited Dependent Variables. Sage, Thousand Oaks, CA.
McFadden, D. (1974). The Measurement of Urban Travel Demand. Journal of Public Economics, 3(4), 303–328.