This chapter focuses on what might be considered the ‘traditional’ statistical approaches to class prediction, in particular general and generalised linear models (GLMs) such as discriminant analysis and logistic regression. It also describes a more recent, related, method called generalised additive modelling (GAM).
The role of a statistical classifier is to identify which class is the most probable, given the available information and certain assumptions about the model structure and the probability distributions. The available information is derived from the values of the predictor variables, while the assumptions depend on the approach.
A typical statistical classifier or model aims to describe the relationship between the class and the predictor. Usually, there will also be some unexplained variation (prediction errors), or noise, that we hope is caused by chance factors and not some deficiency in the model. The statistical classifier, ignoring the noise, can be summarised as class = f(∑wixi), where wi are weights that are applied to the p predictors (xi) to produce a weighted sum from which the class is derived. This format specifies a classifier that is linear with respect to its parameters and which can be envisaged as a plane that separates the classes in p-dimensional predictor space. The problem becomes one of identifying the appropriate values for the weights. One way of thinking about this is in terms of a recipe: changing the proportions (weights) of the ingredients will change the characteristics of the finished food.