Genome-wide association (GWA) analyses are studies where genomic variation measured, often by single nucleotide polymorphism (SNP) markers, is correlated across production, health and other traits of interest to identify candidate loci that regulate them. They must be viewed as the natural evolution of candidate gene association studies (Singer, Reference Singer2009), although they focus on thousands or millions of SNPs without reference to any particular gene (Benyamin, Reference Benyamin, Visscher and McRae2009). Genomic data has been released by the human genome project (International HapMap Consortium, 2005; Sachidanandam et al., Reference Sachidanandam, Weissman, Schmidt, Kakol, Stein, Marth, Sherry, Mullikin, Mortimore, Willey, Hunt, Cole, Coggill, Rice, Ning, Rogers, Bentley, Kwok, Mardis, Yeh, Schultz, Cook, Davenport, Dante, Fulton, Hillier, Waterston, McPherson, Gilman, Schaffner, Van Etten, Reich, Higgins, Daly, Blumenstiel, Baldwin, Stange-Thomann, Zody, Linton, Lander and Altshuler2001), and other livestock (International Chicken Genome Sequencing Consortium, 2004), laboratory (Gibbs et al., Reference Gibbs, Weinstock, Metzker, Munzy, Sodergren, Scherer, Scott, Steffen, Worley, Burch, Okwuonu, Hines, Lewis, DeRamo, Delgado, Dugan-Rocha, Miner, Morgan, Hawes, Gill, Celera, Holt, Adams, Amanatides, Baden-Tillson, Barnstead, Chin, Evans, Ferriera, Fosler, Glodek, Gu, Jennings, Kraft, Nguyen, Pfannkoch, Sitter, Sutton, Venter, Woodage, Smith, Lee, Gustafson, Cahill, Kana, Doucette-Stamm, Weinstock, Fechtel, Weiss, Dunn, Green, Blakesley, Bouffard, De Jong, Osoegawa, Zhu, Marra, Schein, Bosdet, Fjell, Jones, Krzywinski, Mathewson, Siddiqui, Wye, McPherson, Zhao, Fraser, Shetty, Shatsman, Geer, Chen, Abramzon, Nierman, Havlak, Chen, Durbin, Egan, Ren, Song, Li, Liu, Qin, Cawley, Worley, Cooney, D'Souza, Martin, Wu, Gonzalez-Garay, Jackson, Kalafus, McLeod, Milosavljevic, Virk, Volkov, Wheeler, Zhang, Bailey, Eichler, Tuzun, Birney, Mongin, Ureta-Vidal, Woodwark, Zdobnov, Bork, Suyama, Torrents, Alexandersson, Trask, Young, Huang, Wang, Xing, Daniels, Gietzen, Schmidt, Stevens, Vitt, Wingrove, Camara, Mar Albà, Abril, Guigo, Smit, Dubchak, Rubin, Couronne, Poliakov, Hübner, Ganten, Goesele, Hummel, Kreitler, Lee, Monti, Schulz, Zimdahl, Himmelbauer, Lehrach, Jacob, Bromberg, Gullings-Handley, Jensen-Seaman, Kwitek, Lazar, Pasko, Tonellato, Twigger, Ponting, Duarte, Rice, Goodstadt, Beatson, Emes, Winter, Webber, Brandt, Nyakatura, Adetobi, Chiaromonte, Elnitski, Eswara, Hardison, Hou, Kolbe, Makova, Miller, Nekrutenko, Riemer, Schwartz, Taylor, Yang, Zhang, Lindpaintner, Andrews, Caccamo, Clamp, Clarke, Curwen, Durbin, Eyras, Searle, Cooper, Batzoglou, Brudno, Sidow, Stone, Venter, Payseur, Bourque, López-Otín, Puente, Chakrabarti, Chatterji, Dewey, Pachter, Bray, Yap, Caspi, Tesler, Pevzner, Haussler, Roskin, Baertsch, Clawson, Furey, Hinrichs, Karolchik, Kent, Rosenbloom, Trumbower, Weirauch, Cooper, Stenson, Ma, Brent, Arumugam, Shteynberg, Copley, Taylor, Riethman, Mudunuri, Peterson, Guyer, Felsenfeld, Old, Mockrin and Collins2004) and wild species (Li et al., Reference Li, Fan, Tian, Zhu, He, Cai, Huang, Cai, Li, Bai, Zhang, Zhang, Wang, Li, Wei, Li, Jian, Li, Zhang, Nielsen, Li, Gu, Yang, Xuan, Ryder, Leung, Zhou, Cao, Sun, Fu, Fang, Guo, Wang, Hou, Shen, Mu, Ni, Lin, Qian, Wang, Yu, Nie, Wang, Wu, Liang, Min, Wu, Cheng, Ruan, Wang, Shi, Wen, Liu, Ren, Zheng, Dong, Cook, Shan, Zhang, Kosiol, Xie, Lu, Zheng, Li, Steiner, Lam, Lin, Zhang, Li, Tian, Gong, Liu, Zhang, Fang, Ye, Zhang, Hu, Xu, Ren, Zhang, Bruford, Li, Ma, Guo, An, Hu, Zheng, Shi, Li, Liu, Chen, Zhao, Qu, Zhao, Tian, Wang, Wang, Xu, Liu, Vinar, Wang, Lam, Yiu, Liu, Zhang, Li, Huang, Wang, Yang, Jiang, Wang, Qin, Li, Li, Bolund, Kristiansen, Wong, Olson, Zhang, Li, Yang, Wang and Wang2010; Scally et al., Reference Scally, Dutheil, Hillier, Jordan, Goodhead, Herrero, Hobolth, Lappalainen, Mailund, Marques-Bonet, McCarthy, Montgomery, Schwalie, Tang, Ward, Xue, Yngvadottir, Alkan, Andersen, Ayub, Ball, Beal, Bradley, Chen, Clee, Fitzgerald, Graves, Gu, Heath, Heger, Karakoc, Kolb-Kokocinski, Laird, Lunter, Meader, Mort, Mullikin, Munch, O'Connor, Phillips, Prado-Martinez, Rogers, Sajjadian, Schmidt, Shaw, Simpson, Stenson, Turner, Vigilant, Vilella, Whitener, Zhu, Cooper, de Jong, Dermitzakis, Eichler, Flicek, Goldman, Mundy, Ning, Odom, Ponting, Quail, Ryder, Searle, Warren, Wilson, Schierup, Rogers, Tyler-Smith and Durbin2012) genome projects. Since the first successful GWA study published in 2005 (Klein et al., Reference Klein, Zeiss, Chew, Tsai, Sackler, Haynes, Henning, SanGiovanni, Mane, Mayne, Bracken, Ferris, Ott, Barnstable and Hoh2005), this methodology has represented a key tool for the study of common genetic variations in complex traits.
As noted by Wang et al. (Reference Wang, Dickson, Stolle, Krantz, Goldstein and Hakonarson2010), GWA studies have succeeded in the identification of phenotype-associated genetic markers, but pinpointing causal mutations in subsequent fine-mapping studies remains a challenge. Despite marker SNPs not being the causal mutation (Wang, Reference Wang, Dickson, Stolle, Krantz, Goldstein and Hakonarson2010), GWA methodology relies on the assumption that (a) linkage disequilibrium (LD) would enable one or few SNPs to act as surrogate markers for association and (b) these markers would be placed near to the causal genetic variant. Nevertheless, the extent of LD in mammalian genomes (Tenesa et al., Reference Tenesa, Wright, Knott, Carothers, Hayward, Angius, Maestrale, Hastie, Pirastu and Visscher2004; Sargolzaei et al., Reference Sargolzaei, Schenkel, Jansen and Schaeffer2008) are used to reveal significant SNPs across several contiguous cM. As a consequence, this extends the genomic region potentially harbouring causal mutations and enlarges the list of candidate genes to be tested. Moreover, current LD between SNPs may lead to marginally associated effects, even when not in direct LD with the causal mutation (He & Lin, Reference He and Lin2011). The unprecedented potential for false-positives shown by GWAs (Pearson & Manolio, Reference Pearson and Manolio2008) must be viewed as a controversial challenge inherent to this methodology.
Analytical approaches for GWA studies must appropriately account for LD among genetic markers. In this article, the methodology developed by Zeng (Reference Zeng1994) and Jansen & Stam (Reference Jansen and Stam1994) for quantitative trait loci (QTLs) mapping has been adapted to improve both the precision and efficiency of GWA studies. The main idea relies on the inclusion of additional (possibly linked) genetic markers when testing for a specific marker; this must benefit from the statistical properties of multiple regression analysis, which were previously reviewed by Rodolphe & Lefort (Reference Rodolphe and Lefort1993) and Zeng (Reference Zeng1993; Reference Zeng1994) within the context of QTL analysis. Nevertheless, dissimilarities between GWA and QTL analyses (Kemper et al., Reference Kemper, Daetwyler, Visscher and Goddard2012) evidence that previous advantages reported for linkage analysis methods (Zeng, Reference Zeng1994) cannot be directly extrapolated to GWA approaches and statistical properties inherent to our modified GWA approach must be assessed in detail.
This article focuses on two major objectives. First, the multiple regression analysis from Zeng (Reference Zeng1994) and Jansen & Stam (Reference Jansen and Stam1994) has been adapted to the GWA framework. The analytical approach was implemented in Fortran90 programs and is available upon request from the first author of this article (J. Casellas). Second, the statistical performance of this modified GWA methodology has been evaluated on simulated data sets by testing different scenarios; different simulation (e.g., QTL effects and allelic frequencies) and analytical parameters (e.g., LD between competing SNPs and genomic regions involved in the analysis) were evaluated.
2. Materials and methods
(i) Composite GWA analysis
Take as a starting point a sample of n individuals with phenotypic information for a given quantitative trait. Moreover, assume that all individuals are genotyped for m biallelic genetic markers, and these markers are more or less evenly distributed across the genome. Under a standard approach, the analysis of the additive association effect inherent to the kth marker can be carried out by the following model:
where y i is the phenotypic record collected from the ith individual, μ is the population mean, β k is the additive association effect of the kth marker, x ik is an indicator variable taking values of -1 (homozygote), 0 (heterozygote) and 1 (opposite homozygote), and e i is the residual term. Within the context of a composite GWA, previous model generalizes to:
where β j* is the partial regression coefficient of the jth marker in set J, and x ij is an indicator variable (see x ik ). Focusing on a given marker k, note that β k and β k* are both regression coefficients, although their interpretation becomes quite different. Whereas β k estimates the effect of the kth genetic marker on the phenotypic trait and after accounting for the remaining competing markers, β k* must be viewed as a nuisance parameter. From a general point of view, we assume that any marker (i.e., j) included in J must satisfy that (a) j ≠ k, (b) marker j is located no farther away from k than δ cM (i.e., analytical window) and (c) the LD between markers j and k falls within a range of values with predefined boundaries τ 1 and τ 2 (0 ⩽ τ 1 < τ 2 ⩽ 1). Despite additional sources of variation in the previous model summarized into the μ term, this model can expand to accommodate additional factors influencing the phenotypic trait.
(ii) Simulation process
Each simulated population evolved without selection during 1000 non-overlapping generations with effective population size (N e ) 100. In order to mimic a polygynous-like species, which is common under current livestock practices, generation 1001 expanded up to 1000 individuals, with 200 males and 800 females. Note that this design expanded N e up to 640 individuals (Wright, Reference Wright1931). Generation 1001 was randomly mated to obtain 1000 individuals in generation 1002.
Each individual had a 100-cM chromosome with 2000 biallelic SNPs (one SNP each 0·05 cM) and a unique QTL located in cM 50. This initial density of SNPs matched previous research (Habier et al., Reference Habier, Fernando and Dekkers2009; Casellas & Varona, Reference Casellas and Varona2011) and fell within the range of lower (⩽500 markers/M; Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001; Ødegård et al., Reference Ødegård, Soneson, Yazdi and Meuwissen2009) and higher (6000 to ~10 000 SNPs/M; Ibáñez-Escriche et al., Reference Ibáñez-Escriche, Fernando, Toosi and Dekkers2009; Toosi et al., Reference Toosi, Fernando and Dekkers2010) SNP densities reported in the scientific literature. Founder individuals were homozygous throughout the whole genome for the wild-type allele (i.e., allele 1), and this switched from allele 1 to 2 (or vice versa) by appropriate mutation rates. The QTL was affected by a mutation rate of 2·5 × 10−5 in all generations (Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001), whereas SNPs had a mutation rate of 2·5 × 10−3 from generation 1 to 900 (Meuwissen et al., Reference Meuwissen, Hayes and Goddard2001) to guarantee a high percentage of polymorphic markers. This parameter reduced to a more realistic 2·5 × 10−8 for subsequent generations (Hickey & Gorjanc, Reference Hickey and Gorjanc2012). Chromosome recombination was ruled by Kosambi's function (Kosambi, Reference Kosambi1943).
Genomic data from all individuals born in generation 1002 were stored and checked. The minimum allele frequency (MAF) was calculated for each marker in order to validate the two following restrictions: (a) QTLs with MAF ⩾ 0·25, and (b) 900 to 1100 SNPs with MAF ⩾ 0·05. Only those populations satisfying these criteria were retained for further analyses. The restriction applied on the QTLs aimed to narrow the impact of the genetic variability contributed by the QTLs on the overall phenotypic variance, whereas the restriction on the number of polymorphic SNPs tried to homogenize the number of potential competing genetic markers across populations (see below). Given that 100 populations were required for each scenario (see below) and the rejection rate could not be anticipated, simulations were performed back to back until 100 valid populations were available. At the end of the simulation process, the rejection rate was 79%. A unique phenotypic record was generated for each individual born in generation 1002. This resulted from the additive allelic effects from the QTLs and a random value sampled from a standard normal distribution with mean 0 and variance 1. Four different additive allelic effects (α) were assumed for the mutant-type allele (α = 0, 0·25, 0·5 and 1), whereas the additive effect of the wild-type allele was null. These values were assumed in order to simulate QTLs with small (h 2 ~ 0·03), moderate (h 2 ~ 0·13) and large (h 2 ~ 0·33) contributions to the phenotypic variance.
(iii) Analytical process
Different scenarios were generated by combining the additive allelic effect of the QTLs, analytical window and LD range between tested and competing SNPs. The LD between SNPs (r 2) was calculated as the squared correlation of the alleles (Hill & Robertson, Reference Hill and Robertson1968). Both analytical window (10, 30 and 50 cM on each side of the tested SNP) and LD range (0·1 ⩽ r 2 ⩽ 0·9, 0·1 ⩽ r 2 ⩽ 0·5 and 0·5 ⩽ r 2 ⩽ 0·9) had three different values (or ranges), leading to a total of 36 combinations. It is important to note that SNPs with high LD (r 2 > 0·9) were discarded to prevent identifiability problems in the analytical model, whereas SNPs with low LD (r 2 < 0·1) were also discarded to restrict the number of competing SNPs included in the model.
Within each population, SNP-by-SNP analyses were performed twice by applying both the standard GWA and the composite GWA models described above. This duplicate analysis aimed to characterize the statistical performance of composite GWA analysis against a well-known analytical approach. Both models were solved by Gauss-Seidel (Mrode, Reference Mrode2005) and significance of β k was tested by a likelihood ratio test (Neyman & Pearson, Reference Neyman and Pearson1933) with one degree of freedom. Results were discussed on the basis of three different levels of significance. Although ~1000 SNPs were tested within a population, the standard (and uncorrected for multiple testing) p < 0·05 was assumed as upper boundary of significance. On the contrary, Bonferroni's (Reference Bonferroni1936) correction assuming 1000 independent tests was applied as lower boundary of significance (p < 0·00005). Between them, an intermediate significance level was defined by the Benjamini & Hochberg (Reference Benjamini and Hochberg1995) approach with (on average) p < 0·0005.
From each simulation scenario, differences between the composite GWA model and the standard GWA model were evaluated in terms of statistical power (i.e., probability of identifying significantly associated SNPs when α > 0) and specificity (i.e., probability of no SNPs being significant when α > 0) on a chromosomal level, as well as precision. More specifically, precision was evaluated in terms of the total number of significant SNPs, the average absolute distance between significant SNPs and the QTL, and the percentage of significant SNPs located not father than 2·5 cM from the QTL.
(i) Power and specificity
On the basis of the simulation process described above, the average number of competing SNPs varied depending on LD requirements and slightly decreased with the width of the analytical window where competing SNPs were assessed (Fig. 1). The wider range of LD between the tested and competing SNPs (0·1 ⩽ r 2 ⩽ 0·9) included an average of ~42 competing SNPs into the model, although minimum and maximum estimates for the within-chromosome average number of competing SNPs were 34·0 and 50·3, respectively. On average, ~36 competing SNPs were accounted for in the model when LD was restricted to 0·1 ⩽ r 2 ⩽ 0·5, whereas higher LD (0·5 ⩽ r 2 ⩽ 0·9) reduced the average number of competing SNPs to ~6 (Fig. 1). It is important to note that differences on the basis of the width of the genomic window where SNPs were assessed were minimum.
Competing SNPs included in the analytical model must be viewed as a relevant increase of the number of parameters to be inferred; therefore, they may influence both the power (i.e., probability of identifying significantly associated SNPs when α > 0) and specificity (i.e., probability of no SNP being significant when α > 0) of the test. Power decreased for larger numbers of competing SNPs and, as anticipated, power increased with the magnitude of the QTLs effect (Table 1). Maybe more relevant than these trends, we must put a special emphasis on the comparison between composite GWA and standard GWA. The smallest number of model parameters in standard GWA analyses provided the highest power, and this only failed to reveal significantly (p < 0·0005) associated SNPs in some simulated populations when α = 0·25 (Table 1). On the contrary, competing SNPs became a penalization for composite GWA in terms of statistical power, as evidence for small-effect QTL (α = 0·25). This was attenuated for medium-effect QTLs (α = 0·50) and composite GWA almost mimicked the statistical power of standard GWA analyses when testing for large-effect QTLs (α = 1·00) (Table 1).
aGWAS: standard genome-wide association analysis; GWASc: composite genome-wide association analysis.
bAW: width of the analytical window where competing SNPs were assessed.
cLD range: range of linkage disequilibrium between competing SNPs and the tested SNP.
d–: not applicable.
Specificity was improved under composite GWA, this approach discards significant (p < 0·0005) associations in all replicates regardless of LD range and analytical window; standard GWA had 94% specificity, whereas it identified one or more significant (p < 0·0005) SNPs in 6% of the simulated populations under null QTLs effects (Table 1). This was even more drastic if multiple testing correction was not applied. On average, the composite GWA approach reached a ~20% specificity, whereas standard GWA returned significant (p < 0·05) SNPs from all simulated populations.
(ii) Refining QTL-associated genomic regions
The average number of significant (p < 0·0005) SNPs (lower and upper boundaries) under standard GWA depended on the magnitude of the simulated QTLs, this being 18·6 (1 to 56) for α = 0·25, 64·0 (17 to 137) for α = 0·50 and 205·3 (93 to 318) for α = 1·00 (Fig. 2). These averages drastically reduced under composite GWA; this approach identified a maximum of seven significant (p < 0·0005) SNPs when α = 0·25, increasing up to 30 SNPs for α = 0·50. Under large-effect QTLs, the average number of significant SNPs was less than a third of the number of significant SNPs under standard GWA. Moreover, this scenario revealed remarkable differences depending on the range of LD for competing SNPs. The average number of significant SNPs clearly increased when 0·5 ⩽ r 2 ⩽ 0·9 (~63 SNP), whereas 0·1 ⩽ r 2 ⩽ 0·5 revealed ~15 significant SNPs and 0·1 ⩽ r 2 ⩽ 0·9 showed reductions of up to ~8 significant SNPs (Fig. 2).
Results on the average of the absolute distance between the QTL and every significant SNP is shown in Fig. 3. This was larger for α = 1 than for smaller QTLs effects and slightly increased for smaller analytical windows. Differences between standard and composite GWA were almost negligible under α ⩽ 0·5. Only QTLs with α = 1 suggested that standard GWA approaches associated more distant SNPs (from the QTL; 11·8 cM) than composite GWA (~8·8 cM), although lower and upper boundaries overlapped (Fig. 3). A similar pattern was revealed when checking the percentage of significant QTLs located not farther than 2·5 cM from the QTL (Fig. 4). Small- and medium-effect QTLs did not reveal relevant differences between standard and composite GWA (results not shown), whereas simulations under α = 1 suggested advantages when applying composite GWA. On average, the standard GWA identified 20·8% of the significant SNPs in the nearest 2·5 cM around the QTL, whereas this average percentage rose to values larger than 30% when applying composite GWA analysis (Fig. 4).
This research contributes a novel approach for GWA analyses that increases the precision for locating causal mutations but at the expense analytical power. Accurate genome-wide association methodologies are of special relevance in the current genomics era, where large amounts of sequence data are becoming available. The composite GWA approach described in this article focuses on the main idea of including additional (i.e., competing) genetic markers when testing for association effects inherent to a given SNP; although it could be viewed as an over-parameterization of the analytical model, this approach tries to narrow the genomic region where QTL-associated effects can be detected by appropriate SNPs in LD with the causal mutation. Competing SNPs must account for marginally associated effects as was previously shown by Zeng (Reference Zeng1994) and Jansen & Stam (Reference Jansen and Stam1994) within the context of QTLs mapping, this being generalized to genome-wide markers by Bernardo (Reference Bernardo2013).
The composite GWA approach was developed on the basis of multiple regression; focusing on the GWA scenario, various properties inherent to the multiple regression methodology must be revisited before discussing the results obtained under simulation. As noted by Zeng (Reference Zeng1993; Reference Zeng1994) and previously demonstrated by Stam (Reference Stam1991), the expected partial regression coefficient of the analysed trait on the ith SNP depends only on those causal mutations that are located on the interval between the neighbouring SNPs i-1 and i + 1, both of which are accounted for in the analytical model (property 1). This is a very desirable property that characterizes the composite GWA approach as an interval test. Note that standard GWA analyses focusing on SNP-by-SNP approaches are less precise unconditional tests, in which we can only check whether there is one or more causal mutations on a chromosome (Jensen, Reference Jensen1993). Multiple regression analysis allows for conditioning on both unlinked and linked markers, which reduces the sampling variance of the test statistic (property 2) and the chance of interference of possible multiple-linked QTLs (property 3), respectively (Rodolphe & Lefort, Reference Rodolphe and Lefort1993; Zeng, Reference Zeng1993; Reference Zeng1994). Property 2 derives from the evidence that unlinked markers can account for some residual genetic variation and, as a consequence, increase the statistical power of the test. Nevertheless, property 3 may counteract the increase in power because of the increase in the sampling variance inherent to conditional testing (Zeng, Reference Zeng1993). Finally, it has been shown that partial regression coefficients on two markers in a multiple regression analysis are generally uncorrelated, unless the two markers are adjacent; even in this case, correlation is usually very small (Zeng, Reference Zeng1993).
As shown by the results obtained on simulated populations, statistical properties of the composite GWA characterized a compromise between precision (properties 1 and 2) and power (property 3) of the association test. Indeed, power loss was quite relevant as evidenced by the percentage of simulated populations where composite GWA failed to detect significantly associated SNPs (Table 1). Differences between standard and composite GWA analyses were minimum when checking large-effect QTLs, whereas power loss was faster for composite than for standard GWA when the effect of the QTL decreased (Fig. 5). This was not greater than the evidence that the implementation of a composite GWA approach implies the payment of a particularly high price in terms of power, discouraging the systematic use of composite GWA models if medium- to small-effect QTLs could be anticipated. Indeed, composite GWA studies must be viewed as a refining methodology that must be implemented after confirming the presence of significantly associated SNPs by standard GWA analysis. If not, genomic research could be impaired by a massive incidence of false-negatives due to an excessive zeal to refine the location of causal QTLs before roughly identifying their presence and approximating their additive genetic effect. Far from discouraging the implementation of composite GWA analyses, this conclusion warns future users about the consequences of power loss when screening genomic data for association effects.
The composite GWA model developed above assumed two highly flexible model parameters when selecting competing SNPs. Both the width of the analytical window where competing SNPs were assessed and the LD range between the tested SNPs and competing SNPs could be modified and adapted to different scenarios. This allowed for the evaluation of their impact on model performance as well as for elucidation of preliminary recommendations for further genomic analyses. The analytical window had a small impact on the number of competing SNPs (Fig. 1) as well as on the precision of the composite GWA analysis (Fig. 2 to 4). This could be mainly due to the relatively small extent of LD in mammalian genomes (Tenesa et al., Reference Tenesa, Wright, Knott, Carothers, Hayward, Angius, Maestrale, Hastie, Pirastu and Visscher2004; Sargolzaei et al., Reference Sargolzaei, Schenkel, Jansen and Schaeffer2008), which was mimicked in our simulated chromosomes. Nevertheless, small advantages shown by wider windows would suggest that, if not conflicting with computing requirements, the wider the better. A similar conclusion is derived from the preferable range of LD between the tested SNP and competing SNPs. As suggested by properties 2 and 3, the inclusion of both lowly and highly linked competing SNPs could contribute remarkable advantages (and some disadvantages mainly linked to power loss) to the composite GWA. Nevertheless, and compared with remaining composite GWA parameterizations, the wider interval (0·1 ⩽ r 2 ⩽ 0·9) neither remarkably reduced the average absolute distance between significant SNPs and the QTL (Fig. 3) nor increased the percentage of significant SNPs located in the nearest 2·5 cM around the QTL (Fig. 4), although this did suffer from larger power loss. A similar pattern was shown by 0·1 ⩽ r 2 ⩽ 0·5. Within this context, the LD interval characterized by 0·5 ⩽ r 2 ⩽ 0·9 could be viewed as an appealing alternative where the loss of statistical power was attenuated.
The composite GWA model contributes a novel point of view for GWA analyses where testing circumscribed to the genomic region flanking each SNP (delimited by the nearest competing SNP) and conditioning on linked markers increases the precision of locating causal mutations, but possibly at the expense of power.
This research was supported by the research project AGL2010-21176/GAN. The research contract of J. Casellas was partially funded by the Ministerio de Ciencia e Innovación of Spain's government (reference RYC-2009-04049).
Declaration of interest