These data (Table G.1) are part of a data set collected by Jenkins (1991). They are simple measures and attributes recorded from a sample of 87 adults. They are used in this book to discover if a person's smoking habits can be predicted from three continuous, and two categorical, predictors. The continuous predictors are white blood cell count (wbc; × 109 l−1), body mass index (bmi; kg m−2) and age (years). The two categorical predictors are gender and ABO blood type. The class variable is a binary indicator of a person's smoking habits (Y or N). The quantity of tobacco products used by each person was not recorded.
Descriptive statistics and relationships between variables
None of the continuous predictors are correlated with each other, either overall or within the smoking groups. Neither of the categorical predictors is significantly associated with a person's smoking habits (chi-square analysis). In addition, there is no significant association between gender and blood group. White blood cell count and body mass index both differ significantly between the two smoking classes. The white blood cell count is higher in the non-smokers but the body mass index is lower. However, if a correction is applied to the p value for multiple testing the difference in body mass indices becomes insignificant. The mean ages are very similar in the two groups.