Introduction
Current typical genome-wide association studies (GWAS) (e.g., Wellcome Trust Case Control Consortium, 2007) measure hundreds of thousands, or millions, of genetic variants (typically single-nucleotide polymorphisms, or SNPs), in hundreds, thousands, or tens of thousands of individuals, with the primary goal being to identify which regions of the genome harbor SNPs that affect some phenotype or outcome of interest. Although many GWAS are casecontrol studies, here we focus primarily on the computationally simpler setting where a continuous phenotype has been measured on population-based samples, before briefly considering the challenges of extending these methods to binary outcomes.
Most existing GWAS analyses are “single-SNP” analyses, which simply test each SNP, one at a time, for association with the phenotype. Strong associations between a SNP and the phenotype are interpreted as indicating that the SNP, or a nearby correlated SNP, likely affects the phenotype. The primary rationale for GWAS is the idea that by examining these SNPs in more detail – for example, examining which genes they are located within or near – we may glean important insights into the biology of the phenotype under study.
Single-SNP Analysis has Difficulties in Assessing Overall Association Signals
Single-SNP analysis appears to be clean, clear, and easy to perform with standard software packages such as PLINK (Purcell et al., 2007) and BIMBAM (Guan and Stephens, 2008). However, single-SNP analysis has limitations in answering questions that try to gauge the collective strength of association signals in the data.