We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Dietary supplements are commonly consumed but may not be beneficial for everyone. It is known that supplement users have healthy behaviour characteristics but until now concordance between spouses living in the same household has not been investigated and concordance may be an important behavioural determinant.
Design
Prospective cohort study, cross-sectional data analysis.
Setting
European Prospective Investigation into Cancer in Norfolk (EPIC-Norfolk) in the UK, recruitment between 1993 and 1998.
Subjects
Married (or living as married) participants sharing a household, who attended a health examination and completed a 7 d diet diary were included in the analysis (n 11 060). The age range was 39–79 years.
Results
Nearly 75 % of the households in EPIC-Norfolk were concordant in their supplement use, with 46·7 % not using supplements and 27·0 % using supplements. Concordance increased with age; the percentage of concordant couples varied less by other sociodemographic characteristics. Participants who had a spouse who used a supplement were nearly nine times more likely to use a supplement (unadjusted). Depending on participants’ sex and type of supplement used, odds ratios for ‘supplement use by spouse’ in the prediction of participants’ supplement use varied between 6·2 and 11·7 adjusted for participants’ age, smoking status, BMI, social class, education level and physical activity.
Conclusions
‘Supplement use by spouse’ is an independent and the strongest predictor of participants’ supplement use. This phenomenon can be useful in the design of studies and health interventions; or when assessing risk of excessive intake from dietary supplements.
The case-control approach is a powerful method for investigating factors that may explain a particular event. It is extensively used in epidemiology to study disease incidence, one of the best-known examples being Bradford Hill and Doll's investigation of the possible connection between cigarette smoking and lung cancer. More recently, case-control studies have been increasingly used in other fields, including sociology and econometrics. With a particular focus on statistical analysis, this book is ideal for applied and theoretical statisticians wanting an up-to-date introduction to the field. It covers the fundamentals of case-control study design and analysis as well as more recent developments, including two-stage studies, case-only studies and methods for case-control sampling in time. The latter have important applications in large prospective cohorts which require case-control sampling designs to make efficient use of resources. More theoretical background is provided in an appendix for those new to the field.
This book is about the planning and analysis of a special kind of investigation: a case-control study. We use this term to cover a number of different designs. In the simplest form individuals with an outcome of interest, possibly rare, are observed and information about past experience is obtained. In addition corresponding data are obtained on suitable controls in the hope of explaining what influences the outcome. In this book we are largely concerned with binary outcomes, for example indicating disease diagnosis or death. Such studies are reasonably called retrospective as contrasted with prospective studies, in which one records explanatory features and then waits to see what outcome arises. In retrospective studies we are studying the causes of effects and in prospective studies we are studying the effects of causes. We also discuss some extensions of case-control studies to incorporate temporality, which may be more appropriately viewed as a form of prospective study. The key aspect of all these designs is that they involve a sample of the underlying population that motivates the study, in which individuals with certain outcomes are strongly over-represented.
While we shall concentrate on the many special issues raised by such studies, we begin with a brief survey of the general themes of statistical design and analysis. We use a terminology deriving in part from epidemiological applications although the ideas are of much broader relevance.
We start the general discussion by considering a population of study individuals, patients, say, assumed to be statistically independent.
• A case-control study is a retrospective observational study and is an alternative to a prospective observational study. Cases are identified in an underlying population and a comparable control group is sampled.
• In the standard design exposure information is obtained retrospectively, though this is not necessarily the case if the case-control sample is nested within a prospective cohort.
• Prospective studies are not cost effective for rare outcomes. By contrast, in a case-control study the ratio of cases and controls is higher than in the underlying population in order to make more efficient use of resources.
• There are two main types of case-control design; matched and unmatched.
• The odds ratio is the most commonly used measure of association between exposure and outcome in a case-control study.
• Important extensions to the standard case-control design include the explicit incorporation of time into the choice of controls and into the analysis.
Defining a case-control study
Consider a population of interest, for example the general population of the UK, perhaps restricted by gender or age group. We may call a representation of the process by which exposures X and outcomes Y occur in the presence of intrinsic features W the population model. As noted in the Preamble, such a system may be investigated prospectively or retrospectively; see Figure 1.1. In a prospective or cohort study a suitable sample of individuals is chosen to represent the population of interest, values of (W, X) are determined and the individuals are followed through time until the outcome Y can be observed.
The retrospective case-control approach provides a powerful method for studying rare events and their dependence on explanatory features. The method is extensively used in epidemiology to study disease incidence, one of the best known and early examples being the investigation by Bradford Hill and Doll of the possible impact of smoking and pollution on lung cancer. More recently the approach has been ever more widely used, by no means only in an epidemiological setting. There have also been various extensions of the method.
A definitive account in an epidemiological context was given by Breslow and Day in 1980 and their book remains a key source with many important insights. Our book is addressed to a somewhat more statistical readership and aims to cover recent developments. There is an emphasis on the analysis of data arising in case-control studies, but we also focus in a number of places on design issues. We have tried to make the book reasonably selfcontained; some familiarity with simple statistical methods and theory is assumed, however. Many methods described in the book rely on the use of maximum likelihood estimation, and the extension of this to pseudolikelihoods is required in the later chapters. We have therefore included an appendix outlining some theoretical details.
There is an enormous statistical literature on case-control studies. Some of the most important fundamental work appeared in the late 1970s, while the later 1980s and the 1990s saw the establishment of methods for case-control sampling in time.
The misclassification of exposures and outcomes and errors in continuous exposures result in biased estimates of associations between exposure and outcome. A particular consideration that arises in case-control studies is differential error or misclassification that depends on the outcome.
Relatively simple methods can be used to correct for misclassification in binary exposures, provided that there is information available on the sensitivity and specificity of the measured exposure, for example from a validation study. These methods extend to allow differential misclassification and additionally to allow for misclassification in binary outcomes.
Error in continuous exposures arises in many areas of application and can take different forms. The form of the error influences its effect on the estimated association between exposure and outcome.
A commonly used method for correcting error in continuous exposures is regression calibration, which relies on an assumption of non-differential error. Correction methods that allow differential error include multiple imputation and moment reconstruction.
Preliminaries
In this chapter we discuss the effects of misclassification and measurement error and methods for making corrections for these effects. The focus is naturally on case-control studies, but much of the discussion and methods apply more generally. After some preliminary remarks, the chapter is divided broadly into three sections:
• misclassification of a binary or categorical exposure;
• misclassification of case-control status;
• error in the measurement of a continuous exposure.
Logistic regression can be used to estimate odds ratios using data from a case-control sample as though the data had arisen prospectively. This allows regression adjustment for background and confounding variables and makes possible the estimation of odds ratios for continuous exposures using case-control data.
The logistic regression of case-control data gives the correct estimates of log odds ratios, and their standard errors are as given by the inverse of the information matrix.
The logistic regression model is in a special class of regression models for estimating exposure-outcome associations that may be used to analyse case-control study data as though they had arisen prospectively. Another regression model of this type is the proportional odds model. For other models, including the additive risk model, case-control data alone cannot provide estimates of the appropriate parameters.
Absolute risks cannot be estimated from case-control data without additional information on the proportions of cases and controls in the underlying population.
Preliminaries
The previous chapters have introduced the key features of case-control studies but their content has been restricted largely to the study of single binary exposure variables. We now give a more general development. The broad features used for interpretation are as before:
• a study population of interest, from which the case-control sample is taken;
• a sampling model constituting the model under which the case-control data arise and which includes a representation of the data collection process;
• an inverse model representing the population dependence of the response on the explanatory variables; this model is the target for interpretation.
The case-subcohort design, often called simply the case-cohort design, is an alternative to the nested case-control design for case-control sampling within a cohort.
The primary feature of a case-subcohort study is the ‘subcohort’, which is a random sample from the cohort and which serves as the set of potential controls for all cases. The study comprises the subcohort plus all additional cases, that is, those not in the subcohort.
In an analysis using event times the cases are compared with members of the subcohort who are at risk at their event time, using a pseudo-partial likelihood. This results in estimates of hazard ratios.
An advantage of this design is that the same subcohort can be used to study cases of different types.
A simpler form of case-subcohort study disregards event times and is sometimes referred to as a case-base study or hybrid epidemiologic design. In this the subcohort enables estimation of risk ratios and odds ratios.
Preliminaries
In this chapter we continue the discussion of studies described broadly as involving case-control sampling within a cohort. In the nested case-control design, discussed in Chapter 7, cases are compared with controls sampled from the risk set at each event time. A feature of the nested case-control design is that the sampled controls are specific to a chosen outcome and therefore cannot easily be re-used in studies of other outcomes of interest if these occur at different time points; in principle, at least, a new set of controls must be sampled for each outcome studied though some methods have been developed that do enable the re-use of controls.
The nested case-control design accommodates case event times into the sampling of controls.
In this design one or more controls is or are selected for each case from the risk set at the time at which the case event occurs. Controls may also be matched to cases on selected variables.
Nested case-control studies are particularly suited for use within large prospective cohorts, when it is desirable to process exposure information only for cases and a subset of non-cases.
The analysis of nested case-control studies uses a proportional hazards model and a modification to the partial likelihood used in full-cohort studies, giving estimates of hazard ratios. Extensions to other survival models are possible.
In the standard design, controls are selected randomly from the risk set for each case; however, more elaborate sampling procedures for controls, such as counter-matching, may gain efficiency. A weighted partial-likelihood analysis is needed to accommodate non-random sampling.
Preliminaries
We have focused primarily so far on case-control studies in which the cases and controls are sampled from groups of individuals who respectively do and do not have the outcome of interest occurring within a relevant time window. This time window is typically relatively short. If cases are to be defined as those experiencing an event or outcome of interest occurring over a longer time period, or if the rate of occurrence of the event is high, then the choice of a suitable control group requires special care.
In two-stage case-control designs, limited information is obtained on individuals in a first-stage sample and used in the sampling of individuals at the second stage, where full information on exposures and other variables is obtained. The first stage may be a random sample or a case-control sample; the second stage is a case-control sample, possibly within strata. The major aim of these designs is to gain efficiency.
Two-stage studies can be analysed using likelihood-based arguments that extend the general formulation based on logistic regression.
Special sampling designs for matched case-control studies include countermatching, which uses some information on individuals in the potential pool of controls to select controls in such a way as to maximize the informativeness of the case-control sets.
Family groupings can be used in case-control-type studies, and there is a growing literature in the epidemiological, statistical and genetics fields. In one approach, cases are matched to a sibling or other relative.
Preliminaries
So far we have discussed case-control studies in which cases and controls are sampled, in principle at random, from the underlying population on the basis of their outcome status. We have also considered extensions, including matched studies and stratified sampling, in both of which it is assumed that some features of individuals in the underlying population are easily ascertained. Sometimes it is useful to consider alternative ways of sampling in a case-control study. In this chapter we discuss some special case-control sampling designs.
The cornerstone of the analysis of case-control studies is that the ratio of the odds of a binary outcome Y given exposure X = 1 to that given X = 0 is the same as the ratio of the odds where the roles of Y and X are reversed. This result means that prospective odds ratios can be estimated from retrospective case-control data.
For binary exposure X and outcome Y there are both exact and large-sample methods for estimating odds ratios from case-control studies.
Methods for the estimation of odds ratios for binary exposures extend to categorical exposures and allow the combination of estimates across strata. The latter enables control for confounding and background variables.
For binary exposure X and outcome Y, the probabilities of X given Y and of Y given X can be formulated using two different logistic regression models. However, the two models give rise to the same estimates of odds ratios under maximum likelihood estimation.
Rate ratios can be estimated from a case-control study if ‘time’ is incorporated correctly into the sampling of individuals; a simple possibility is to perform case-control sampling within short time bands and then to combine the results.
Preliminaries
Many central issues involved in the analysis of case-control data are illustrated by the simplest special case, namely that of a binary explanatory variable or risk factor and a binary outcome or response.
Case-control studies can involve more than two outcome groups, enabling us to estimate and compare exposure-outcome associations across groups.
Studies may involve multiple case subtypes and a single control group, or one case group and two or more control groups, for example.
Case-control studies with more than two outcome groups can be analysed using pairwise comparisons or special polychotomous analyses. The general formulation based on logistic regression extends to this situation, meaning that the data from such studies can be analysed as though arising from a prospective sample.
By contrast, in some situations case-only studies are appropriate; in these no controls are required. In one such situation the nature of the exposure contrasts studied may make the absence of controls reasonable. In another, each individual is in a sense his or her own control.
Preliminaries
In most of this book we are supposing that there are just two possible outcomes for each individual defining them as either cases or as controls. However, there are two contrasting situations where other than two outcomes are involved. First, it may be of interest to estimate and compare risk factors for three or more outcomes; an extended case-control design can be used to make comparisons between more than two outcome groups. The other, very contrasting, situation occurs when controls may be dispensed with, the case-only design. In the first part of this chapter we consider the former situation, and in the second part the latter.
• The individual matching of controls to cases in a case-control study may be used to control for confounding or background variables at the design stage of the study.
• Matching in a case-control study has some parallels with pair matching in experimental designs. It uses one of the most basic methods of error control, comparing like with like.
• The simplest type of matched case-control study takes a matched-pair form, in which each matched set comprises one case and one control.
• Matched case-control studies require a special form of analysis. The most common approach is to allow arbitrary variations between matched sets and to employ a conditional logistic regression analysis.
• An alternative analysis suitable in some situations uses a regression formulation based on the matching variables.
Preliminaries
An important and quite often fruitful principle in investigating the design and analysis of an observational study is to consider what would be appropriate for a comparable randomized experiment. What steps would be taken in such an experiment to achieve secure and precise conclusions? To what extent can these steps be followed in the observational context and what can be done to limit the loss of security of interpretation inherent in most observational situations?
In Chapter 2 we studied the dependence of a binary outcome, Y, on a single binary explanatory variable, the exposure, X.
The accumulation and possible combination of evidence from different studies, as contrasted with an emphasis on obtaining individually secure studies, is crucial to producing convincing evidence of generality of application and interpretation.
Methods for combining estimates across studies include those that can be applied when the full data from individual studies are available and those that can be applied when only summary statistics are available. They make allowance for heterogeneity of the effect of interest. The methods are general and not specific to case-control studies.
Some special considerations may be required for the combining of results from case-control studies and full cohort studies or from matched and unmatched case-control studies.
Preliminaries
The emphasis throughout this book has been on analysis and design aimed to produce individually secure studies. Yet in many fields it is the accumulation of evidence of various kinds and from various studies that is crucial, sometimes to achieve appropriate precision and, often of even more importance, to produce convincing evidence of generality of application and interpretation. We now consider briefly some issues involved, although most of the discussion is not particularly specific to case-control studies. The term meta-analysis is often used in this context. The distinctive issues are, however, not those of analysis but more those of forming appropriate criteria for the inclusion of data and of assessing how meaningfully comparable different studies really are. Important issues of the interpretation of apparent conflicts of information may be best addressed by the traditional method of descriptive review.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.