Categorical Responses
Data collection often involves classifying individual observations into mutually exclusive categories. For example, industrial experiments can involve classifications of product testing as a success or failure; education studies can classify students as unsatisfactory, partially proficient, proficient, or advanced; health studies often identify diseases as present or not present. When such types of information are considered as responses in a statistical model, normal regression techniques are insufficient as categorical responses cannot be assumed to behave normally, as we saw with the poor fit of some of the normal regression models from Chapter 3. In fact, categorical data routinely involve highly skewed data, nonconstant variation in the response, and nonlinear relationships with predictors. As a consequence, researchers prefer to use nonlinear logistic regression modeling in lieu of normal least squares methods.
When modeling categorical responses it is most common to model the probability that an observational unit is classified according to any of the possible response groups. With two possible classifications, a binary response, ordinary logistic regression is most often applied to model the probability associated with one of the two outcome groups. Alternative methods exist, but have disadvantages in interpretations as compared to the logistic regression model. When considering a nominal response with more than two response classifications, multinomial logistic regression can be applied. When considering an ordinal response with classifications that can be arranged by some measure of magnitude, a number of ordinal multinomial logistic regression models can be used.
While all logistic models involve probabilities associated with different response classifications, interpretations and significance are often made with respect to probability comparison statistics such as odds and odds ratios. Logistic regression models allow for nonlinear relationships between response probabilities and predictors, and allow for nonconstant variation in the response, although the nonconstant variation is expected to behave according to a specific relationship with the response probability.
Binary Logistic Regression
Consider the case of a binary response variable, such as pass versus fail, approved versus denied, or operational versus damaged. Binary logistic regression can be used to model the probability associated with one of the response classifications, often referred to as the “success” classification.