In Chapter 1we introduced several substantive problems that mainly involved scientific studies. In this chapter, we return to these problems. The goal here is not simply to illustrate semiparametric modeling techniques but to show how these techniques can be integrated into scientific studies. Analyses for about half of the studies have recently been published and so, in order to save space, we will simply refer the interested reader to the relevant journal article.
Cancer Rates on Cape Cod
An analysis of the Cape Cod cancer data is given in French and Wand (2003). In their presentation, a logistic geoadditive model (Section13.6) leads to maps showing regions of elevated relative cancer risk after accounting for age and smoking status. The model developed there also accounts for missingness (missing values) in the smoking variable.
Assessing the Carcinogenicity of Phenolphthalein
Parise and colleagues (2001) used semiparametric logistic mixed models to assess the carciogenicity of phenolphthalein. After adjusting for rodent weight, they were not able to find a significant dose effect for phenolphthalein.
Salinity and Fishing in North Carolina
Real data sets often illustrate several different statistical principles. The salinity data set is not simply an example of semiparametric modeling; it also shows the differing effects of outliers on parametric and nonparametric modeling.
The salinity data are introduced in Section1.2. Recall the definitions of the variables: salinity is the measured value of salinity in Pamlico Sound, lagged.sal is salinity two weeks earlier, and discharge is the amount of fresh water flowing into the sound from rivers. In this example there are two unusual values of discharge, and the question naturally arises of whether these data points should be included.