## Methods

### Test statistic for detecting a QTL

Consider an *F* _{2} population or its progeny populations produced by further selfing and/or intercrossing the *F* _{2} individuals for different numbers of generations. There are three possible genotypes, *P* _{1} homozygote, heterozygote and *P* _{2} homozygote for any gene. Let *Q* _{j}*Q* _{j}, *Q* _{j}*q* _{j} and *q* _{j}*q* _{j} be the three possible genotypes of a QTL, say *Q* _{j}, under consideration in a population. For an individual *i* in a random sample with size *n*, let *x* _{ij}* represent the coded variable of QTL genotype as

and *a* _{j} denote its additive effect. Similarly, it is straightforward to construct the coded variable *x* _{ik}* for another QTL, say *Q* _{k}, with additive effect *a* _{k} for a model taking multiple QTLs into account. When *Q* _{j}, flanked by the left marker *M* _{j} and right marker *N* _{j} with alleles (*M* _{j}, *m* _{j}) and (*N* _{j}, *n* _{j}), is considered, the conditional expectation of *x* _{ij}* given *M* _{j} and *N* _{j}, *w* _{ij}=*E*(*x* _{ij}*|*M* _{j}, *N* _{j}), is used as the predictor variable in the REG interval mapping model (Haley & Knott, 1992). For a single QTL model, Hu & Xu (2008) have shown that the test statistic

(σ^{2} is the residual error variance) follows a central *F*-distribution under the null hypothesis (*H* _{0}: *a* _{j}=0). Under the alternative hypothesis (*H* _{1}: *a* _{j}≠0), this test statistic follows a non-central *F*-distribution with the non-centrality parameter δ=*n*×var(*w* _{ij})×*a* _{j}^{2}/σ^{2}. The non-centrality parameter is a function of several important factors, sample size, variance of the predictor variable, QTL effect and residual error variance. By analysing these factors, the power for detecting a QTL can be predicted for different situations. For example, Hu & Xu (1998) analysed one of the key factors, the variance of the coded variable, var(*w* _{ij}), by deriving its different formulations for the BC, *F* _{2}, RIL and DH. When *n* is sufficiently large, ∑(*w* _{ij}−_{ij})^{2}=*n*×var(*w* _{ij}), and the variance of the estimated effect is var(*â* _{j})=σ^{2}/[*n*×var(*w* _{ij})]. When var(*w* _{ij}) is small, var(*â* _{j}) is large and δ is small, leading to lower power in QTL detection.

### Variances of predictor variables

The aim of this study is to calculate the power for detecting two or more closely linked QTLs and to extend the power analysis to the populations beyond *F* _{2} using both REG and ML interval mapping. When analysing the power for detecting one QTL, we only need to understand the asymptotic behaviour of the variances of predictor variables to construct the test statistic for power analysis as has been done by Hu & Xu (2008) . For dissecting linked QTLs, we should further derive the covariances between different QTL predictor variables to obtain the asymptotic variance–covariance matrix of QTL parameters for power analysis. An important step to obtain the variances and covariances of the predictor variables is to characterize the genotypic distributions of multiple genes in the populations. For example, evaluating *E*(*w* _{ij}) and var(*w* _{ij}) in the BC between a population *M* _{j}*N* _{j}/*M* _{j}*N* _{j} on *F* _{1} requires considering the four flanking marker genotypes of two genes, *M* _{j}*N* _{j}/*M* _{j}*N* _{j}, *M* _{j}*N* _{j}/*M* _{j}*n* _{j}, *M* _{j}*N* _{j}/*m* _{j}*N* _{j} and *M* _{j}*N* _{j}/*m* _{j}*n* _{j} with frequencies (1−*r*)/2, *r*/2, *r*/2 and (1−*r*)/2, where *r* is the recombination fraction between A and B (Xu, 1995). Evaluating them in the *F* _{2} between two populations, *M* _{j}*N* _{j}/*M* _{j}*N* _{j} and *m* _{j}*n* _{j}/*m* _{j}*n* _{j}, requires taking into account ten marker genotypes of two genes, *M* _{j}*N* _{j}/*M* _{j}*N* _{j}, *m* _{j}*n* _{j}/*m* _{j}*n* _{j}, *M* _{j}*n* _{j}/*M* _{j}*n* _{j}, *m* _{j}*N* _{j}/*m* _{j}*N* _{j}, *M* _{j}*N* _{j}/*M* _{j}*n* _{j}, *M* _{j}*N* _{j}/*m* _{j}*N* _{j}, *M* _{j}*n* _{j}/*m* _{j}*n* _{j}, *m* _{j}*N* _{j}/*m* _{j}*n* _{j}, *M* _{j}*N* _{j}/*m* _{j}*n* _{j} and *M* _{j}*n* _{j}/*m* _{j}*N* _{j} with frequencies (1−*r*)^{2}/4, (1−*r*)^{2}/4, *r* ^{2}/4, *r* ^{2}/4, *r*(1−*r*)/2, *r*(1−*r*)/2, *r*(1−*r*)/2, *r*(1−*r*)/2, (1−*r*)^{2}/2 and *r* ^{2}/2 (Hu & Xu, 2008). In the progeny populations from *F* _{2}, these ten genotypic frequencies change over populations. For AI populations subject to more cycles of random mating, the well-known formula, *P*′(*M*_{j}N_{j})=(1−*r*)×*P*(*M*_{j}N_{j})+*r*×*P*(*M* _{j})×*P*(*N* _{j}), can be used to obtain the genotypic frequencies, where *P*′(*M*_{j}N_{j}) is the frequency of *M*_{j}N_{j} in the next generation. For RI populations subject to further selfing, Haldane & Waddington's transition equations (1931) can be applied to obtain the ten frequencies. Using the same notations as in Haldane & Waddingtion's paper, we denote the frequency of *M* _{j}*N* _{j}/*M* _{j}*N* _{j} (*m* _{j}*n* _{j}/*m* _{j}*n* _{j}) genotype as *C*, the frequency of *M* _{j}*n* _{j}/*M* _{j}*n* _{j} (*m* _{j}*N* _{j}/*m* _{j}*N* _{j}) genotype as *D*, the frequency of *M* _{j}*N* _{j}/*M* _{j}*n* _{j} (*M* _{j}*N* _{j}/*m* _{j}*N* _{j}, *M* _{j}*n* _{j}/*m* _{j}*n* _{j}, or *m* _{j}*N* _{j}/*m* _{j}*n* _{j}) genotype as *E*, the frequency of *M* _{j}*N* _{j}/*m* _{j}*n* _{j} genotype as *F* and the frequency of *M* _{j}*n* _{j}/*m* _{j}*N* _{j} genotype as *G*, respectively, in the populations. With such settings, it is straightforward to show that *E*(*w* _{ij})=0 and to formulate the variance of *w* _{ij} in a population as

where *p* _{k1} and *p* _{k3}, *k*=1, 2, …, 10, are conditional probabilities of *Q* _{j}*Q* _{j} and *Q* _{j}*q* _{j} genotypes given the ten flanking marker genotypes, and *f* _{k} are the frequencies of the ten marker genotypes for two flanking markers (*C*, *D*, *E*, *F* and *G*). Note that the derivation of *p* _{k1} and *p* _{k3} is not straightforward as has been done in BC and *F* _{2} populations, and it involves using the genotypic distributions of three genes (Kao & Zeng, 2009). If the event of double recombinations is ignored within a marker interval, equation (3) can be explicitly formulated as

where *p*=*r* _{1}/*r* (*r* and *r* _{1} are the recombination fractions between (*M* _{j}, *N* _{j}) and (*M* _{j}, *Q* _{j})). It is interesting to analyse equation (4) to gain some insight into var(*w* _{ij}). In equation (4), var(*w* _{ij}) is bounded by 2(*C*+*D*+*E*), which is the variance of a fully observed QTL-coded variable. The term *p*(1−*p*) measures the relative QTL position in a marker interval, and *E*+2*D* measures the interval size. As the marker interval becomes wider or the QTLs get closer to the centre position of the interval, *E*+2*D* or *p*(1−*p*) becomes larger, and the value of var(*w* _{ij}) becomes smaller. In the *F* _{2} population, 2(*C*+*D*+*E*)=1/2, *E*+2*D*=*r*/2 and var(*w* _{ij})=1/2−2*rp*(1−*p*), which are bounded by 1/2. In AI *F* _{t} populations, 2(*C*+*D*+*E*)=1/2 and *E*+2*D*=*r* _{t}/2, where *r* _{t}=[1−(1−*r*)^{t−2}(1−2*r*)]/2. The variance var(*w* _{ij})=1/2−2*r* _{t}*p*(1−*p*), which is also bounded by 1/2 (*p*=*r* _{1t}/*r* _{t}) and decreases in the later populations. In RI populations, 2(*C*+*D*+*E*) is between 1/2 and 1, and *E*+2*D* is between 2/*r* and 2*r*/(1+2*r*). The value of var(*w* _{ij}) increases as population advances. In RIL, 2(*C*+*D*+*E*)=1, *E*+2*D*=2*r*/(1+2*r*) and var(*w* _{ij})=1−(8*r*/(1+2*r*))*p*(1−*p*), which are bounded by 1. Similarly, the variance of the predictor variable for dominance effect is about ~1/4−*r*/2×{1−*r*[(1−2*p*(1−*p*))^{2}+2*p*(1−*p*)]}, which is bounded by 1/4 in the *F* _{2} population. The variance of the predictor variable is var(*w* _{ij})=1/4−*rp*(1−*p*) bounded by 1/4 in the BC population (see also Xu, 1995). Hu & Xu (2008) formulated var(*w* _{ij}) in the *F* _{2}, RIL and DH populations when double recombinations in the intervals are considered. In general, the larger the variance of a predictor variable, the greater the power in QTL detection.

### Power for detecting a QTL

When only one QTL is considered in the model, Hu & Xu (2008) showed an example that var(*w* _{ij}) is 0·450 for a QTL located in the middle of a 10-cM marker interval (*r* _{1}=*r* _{2}=0·04758 and *r*=0·09063), and that 252 individuals are required to detect this QTL with 80% power under α=0·01 when the QTL explains 5% of the trait variation in the *F* _{2} population. Our formulae in equation (3) allow us to calculate the values of var(*w* _{ij}) and sample sizes required in different populations under the same conditions. For the same conditions, the values of var(*w* _{ij}) derived using our formulae in the different AI and RI populations are presented in Table 1. It shows that the trend in the change of variance behaves differently under selfing and random mating. When further selfing, the variance increases. When successive intercrossing, the variance tends to decrease. For example, the values are 0·651 and 0·806 in the RI *F* _{3} and RIL (generation 10 of RI population is called RIL), respectively, and they are 0·426 and 0·271 in the AI *F* _{3} and AI *F* _{10}, respectively. The different values of var(*w* _{ij}) cause the non-centrality parameter to be different, thus affecting the power of detection. To guarantee an 80% power to detect this QTL under α=0·01, it would require about 175, 155, 148 and 143 individuals in the RI *F* _{3}, *F* _{4}, *F* _{5} and RIL populations, and it would require 262, 284, 302 and 426 individuals in the AI F_{3}, F_{4}, F_{5} and F_{10} populations. This shows that the sample size can be saved in the more advanced RI populations and may not be saved in the later AI populations when mapping a single QTL located in the interval.

### Covariances between predictor variables

To obtain covariances between the predictor variables, cov(*w* _{ij}, *w* _{ik})'s, we need to understand the genotypic distributions of three and four genes in a population. For two linked QTLs, *Q* _{j} and *Q* _{k}, flanked by two marker pairs (*M* _{j}, *N* _{j}) and (*M* _{k}, *N* _{k}) they can be located in neighbouring or non-neighbouring marker intervals. For the neighbouring case, the order is *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k} (*N* _{j} and *M* _{k} are the same marker). For the non-neighbouring case, the order is *M* _{j}-*Q* _{j}-*N* _{j}–*M* _{k}-*Q* _{k}-*N* _{k} order. Note that the case for QTLs located in non-neighbouring intervals may include additional markers between *N* _{j} and *M* _{k}. For the case of *M* _{j}-*Q* _{j}-*N* _{j}–*M* _{k}-*Q* _{k}-*N* _{k} order, the two predictor variables, *w* _{ij} and *w* _{ik}, are constructed using the marker pairs (*M* _{j}, *N* _{j}) and (*M* _{k}, *N* _{k}). Therefore, computing their covariance, cov(*w* _{ij}, *w* _{ik}), needs to considered all for the 136 possible genotypes of *M* _{j}, *N* _{j}, *M* _{k} and *N* _{k} markers (see the Appendix). For the case of *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k} order, obtaining the covariance only needs to evaluate all the 36 marker genotypes of *M* _{j}, *N* _{j} and *N* _{k} markers. The latter case is more difficult to detect *Q* _{j} and *Q* _{k} simultaneously as they share the same flanking marker *N* _{j}. The covariance between *w* _{ij} and *w* _{ik} can be generally expressed as

where *n* _{g}=36 or 136 and *f* _{k} are the genotypic frequencies of flanking markers from trigenic and tetragenic distributions. In *F* _{2} population, the genotypic distributions of three and four markers can be obtained from the product of probability distributions of adjacent pairwise genes, i.e. *P*(*M*_{j}N_{j}N_{k})=*P*(*M*_{j}N_{j})×*P*(*N*_{j}N_{k}) and *P*(*M*_{j}N_{j}M_{k}N_{k})=*P*(*M*_{j}N_{j})×*P*(*N*_{j}M_{k})×*P*(*M*_{k}N_{k}), under the Haldane map function. For example, the gamete frequency *P*(*M*_{j}N_{j}M_{k}N_{k})=(1−*r* _{1})(1−*r* _{2})(1−*r* _{3})/2 in the *F* _{2} population, where *r* _{1}, *r* _{2} and *r* _{3} are the recombination fractions between (*M* _{j}, *N* _{j}), (*N* _{j}, *M* _{k}) and (*M* _{k}, *N* _{k}). For the advanced populations beyond *F* _{2}, trigenic and tetragenic genotypic distributions cannot be obtained from the direct product of pairwise gene distributions. We use special devises outlined in Kao & Zeng (2009) and in the Appendix to obtain the genotypic distributions of three and four genes. Although the covariance in equation (5) does not have a simple form as in equation (4) for variance, it can be easily written into a computer programme to obtain the covariances under different situations in different populations. For example, in the case of *M* _{j}-*Q* _{j}-*N* _{j}–*M* _{k}-*Q* _{k}-*N* _{k} order with , , and , the values of cov(*w* _{ij}, *w* _{ik}) are 0·7445, 0·6736, 0·6095, 0·5515 and 0·4991 for , 15 20, 25 and 30 cM, respectively, in the *F* _{2} population. In the case of *M* _{j}-*Q* _{j}-*N* _{jk}-*Q* _{k}-*N* _{k} order with , , and , its covariances in different populations are presented in Table 1. Table 1 shows that the covariance increases under further selfing and decreases when subjected to more intercrossing. For example, the covariance is 0·409 in the *F* _{2} population. The values are 0·577 and 0·688 in the RI *F* _{3} and RIL, respectively, and they are 0·372 and 0·189 in the AI *F* _{3} and AI *F* _{10}, respectively. Although the covariance can become larger or smaller, the correlations between the coded variables, ρ(*w* _{ij}, *w* _{ik}), all decrease in the advanced populations (Table 1). The correlation is 0·909 in the *F* _{2} populations. It becomes 0·886 and 0·854 in the RI *F* _{3} and RIL, and it is 0·874 and 0·696 in the AI *F* _{3} and *F* _{10} populations. As will be discussed later, the detection of linked QTLs can benefit from the diminishing correlation between predictor variables in the advanced populations.

### Variances of the estimated QTL effects

For a single QTL model, we only need the variance of the coded variable, var(*w* _{ij}), to construct a test statistic in power analysis (equation (2)). As the variance of the estimated effect is the inverse of the information number of QTL effect, i.e. var(*â* _{j})=*I* ^{−1}(*a* _{j}), for *n* large, we have

and var^{−1}(*â* _{j})/*n*=*I*(*a* _{j})/*n*~var(*w* _{ij})/σ^{2} in a single QTL model. For multiple, say *p*, QTLs in the model, the variance–covariance matrix of the predictor variables is required in constructing the test statistics. Similarly, for *n* large, we have *I*(*a*)/*n*=[(*W*′*W*)/σ^{2}]/*n→V*(*W*)/σ^{2}, where *W* denotes the matrices whose *i*, *j*th entry is *w* _{ij} and *V*(*W*) is the variance–covariance matrix with diagonal elements var(*w* _{ij})'s, *j*=1, 2, …, *p*, and off-diagonal elements cov(*w* _{ij}, *w* _{ik})'s. Under normal assumption, *n* ^{1/2}(*â*−a)→N_{p}(0,*V* ^{−1}(*W*)×σ^{2}) (Fuller 1976). Without loss of generality, we present the case of *p*=2 with *Q* _{j} and *Q* _{k} in the model for a better illustration. For *p*=2, the *V* ^{−1}(*W*) matrix is

where . Therefore, the variances of estimated *a* _{j} and *a* _{k} are

and

respectively. By comparing equations (6) with (8), it shows that the variances of the estimated QTL effects are not only affected by var(*w* _{ij}) and var(*w* _{ik}) but also by cov(*w* _{ij}, *w* _{ik}) through ρ(*w* _{ij}, *w* _{ik}). The first term on the right-hand side of equation (8) is usually called variance inflation factor (VIF), which can be also expressed in terms of information numbers, *I*(*a* _{j}), *I*(*a* _{k}) and *I*(*a* _{j}, *a* _{k}), as

The VIF can measure the inflation level of the variance of an estimate (Marquardt, 1970). When ρ(*w* _{ij}, *w* _{ik})=0, VIF=1 and there is no variance inflation. If ρ(*w* _{ij}, *w* _{ik})≠0, VIF>1 indicating that the inflation of variances occurs. In general, large VIF indicates seriously inflated variances and a severe collinearity problem, and the linked QTL are not likely to be detected statistically. For the same *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k} order considered in Table 1, the value of VIF in var(*â* _{j}) or var(*â* _{k}) is 5·750 ((1−0·909^{2})^{−1}), implying that its variance is inflated by 5·750 times as compared to when they are unlinked. The values of VIF are 4·651 and 3·694 in the RI *F* _{3} and RIL, respectively, and they are 4·650 and 1·940 in the AI *F* _{3} and AI *F* _{10}, respectively. The values of VIF become smaller in the more advanced populations. Therefore, advanced populations have the ability to provide smaller VIF values for more powerful QTL detection (more explanation is given below). Also, the VIF is generally larger when interval sizes become wider or the putative QTL move towards the centres of intervals (not shown). With VIF, *V* ^{−1}(*W*) in equation (7) can be simplified in expression as *V* ^{−1}(*W*)=VIF×*A* _{0}, where *A* _{0}=[*a* _{ij}]_{2×2} denotes the 2×2 matrix in the equation.

### Test statistics for detecting linked QTL

We now derive the test statistics for analysing the separation of linked QTL and calculating the separating power. Let

be the standardized estimated QTL effects, where σ_{j}^{2}=VIF×*a* _{11}×σ^{2}/*n* and σ_{k}^{2}=VIF×*a* _{22}×σ^{2}/*n* are the variances of the estimated effects (*a* _{11}=var^{−1}(*w* _{ij}) and *a* _{22}=var^{−1}(*w* _{ij})). As *I* ^{−1}(*a* _{j})=(*a* _{11}×σ^{2})/*n* and *I* ^{−1}(*a* _{k})=(*a* _{22}×σ^{2})/*n*, it is more convenient and succinct to express σ_{j}^{2} and σ_{k}^{2} as

in a population. Accordingly, the joint distribution of *t* _{j} and *t* _{k} follows a bivariate normal distribution with mean zero and covariance matrix with diagonal elements, one, and off-diagonal elements, ρ(*w* _{ij}, *w* _{ik}), as

Given a pre-specified critical value *c* at the significance level α, the power of separation is the sum of probabilities that *t* _{j} and *t* _{k} are simultaneously different from zeros:

in the bivariate normal distribution. Note that the sum of four probabilities is equivalent to Type I error α under the null hypothesis (*H* _{0}: *a* _{j}=0 and *a* _{k}=0). Under the alternative hypothesis (*H* _{1}: *a* _{j}≠0 and *a* _{k}≠0), equation (13) is the power to reject *H* _{0} and allows us to evaluate the power of separation for different values of *a* _{j} and *a* _{k} in different populations (see section 4).

When an ML interval mapping is implemented in separating linked QTLs, the model is a normal mixture model under the assumption of normal errors. We use *x* _{ij}*'s to denote the predictor variables in the ML interval mapping models. By treating *x* _{ij}*'s as missing data and *y* _{i} as observed data, we can apply the EM algorithm to obtain the MLE and information matrix by operating on the complete-data likelihood

For *p* QTL, there are 3^{p} QTL genotypes, and let μ_{j}, *j*=1, 2, …, 3^{p}, denote their genotypic values. In the complete-data likelihood, the conditional distribution of the observed data given missing data, *f*(*y* _{i}|θ, *x* _{i1}*, …, *x* _{ip}*), follows a normal distribution *N*(μ_{j}, σ^{2}), and *g*(*x* _{i1}*, …, *x* _{ip}*) is a 3^{p}-nomial distribution depending on the values of *x* _{ij}*'s (QTL genotypes). Let *q* _{ij}'s be the 3^{p}-nomial probabilities derived from the conditional probabilities of QTL genotypes given the flanking marker genotypes. Both MLE and observed information matrix involve the posterior probabilities of the QTL genotypes, (please see Kao & Zeng (1997) for more details about the derivations). Therefore, for *p*=2, evaluating the (expected) information numbers, *I*(*a* _{j}), *I*(*a* _{k}) and *I*(*a* _{j}, *a* _{k}), needs to integrate the distribution of markers and traits, and thus is more challenging. Here, we suggest a Monte Carlo simulation approach to evaluate the expected *π* _{ij} by simulating, say 10 000, individuals to approximate the expected *π* _{ij} as , where denotes the value of π_{ij} of each individual. In turn, the information numbers can be obtained. Similarly, to those outlined in REG interval mapping, we can denote *I*(*a* _{j})/*n*=var(*x* _{ij}*)/σ^{2} and *I*(*a* _{j}, *a* _{k})/*n*=cov(*x* _{ij}*, *x* _{ik}*)/σ^{2} for sufficiently large *n* in ML interval mapping. Table 1 presents the values of *I*(*a* _{j}) and *I*(*a* _{j}, *a* _{k}) for the same case of *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k} order. The values are obtained by simulating trait values governed by two QTLs with equal effects, and the heritability is *h* ^{2}=0·05 with σ^{2}=1. As σ^{2}=1, *I*(*a* _{j})=var(*x* _{ij}*) and *I*(*a* _{j}, *a* _{k})=cov(*x* _{ij}*, *x* _{ik}*). The values of var(*x* _{ij}*) are 0·437, 0·640 and 0·815 in the *F* _{2}, RI *F* _{3} and RIL, respectively, and are 0·428 and 0·310 in the AI *F* _{3} and *F* _{10}, respectively (the values of var(*x* _{ik}) are of very similar size and not presented). As compared to var(*w* _{ij}) in REG interval mapping, these variances are of similar sizes. The values of cov(*x* _{ij}*, *x* _{ik}*) are 0·380, 0·563, 0·691, 0·370 and 0·192 in the *F* _{2}, RI *F* _{3}, RIL, AI *F* _{3} and AI *F* _{10} populations. Except for the value in generation 10, these values are smaller as compared to the values of cov(*w* _{ij}, *w* _{ik}) in REG interval mapping (the values of cov(*w* _{ij}, *w* _{ik}) are 0·409, 0·577, 0·688, 0·392 and 0·189, respectively). Also, the values of correlation between the QTL-coded variables can be also obtained (Table 1). In general, the predictor variables in ML interval mapping have smaller covariances (correlations). Therefore, the ML method will have smaller VIF values when fitting closely linked QTL together. The values of VIF are 4·084, 4·299 and 3·232 in the *F* _{2}, RI *F* _{3} and RIL, respectively, and are 3·999 and 1·655 in the AI *F* _{3} and AI *F* _{10}, respectively. These results indicate that the ML interval mapping suffers a low collinearity problem, and it can be more efficient and powerful in detecting linked QTLs as will be further validated in sections 3 and 4. By obtaining the information numbers of the QTL effects for ML interval mapping, the components in equation (12) can be updated to construct test statistics, *t* _{j}=(*â* _{j}−*a* _{j})/σ_{j} and *t* _{k}=(*â* _{k}−*a* _{k})/σ_{k} for ML interval mapping. Then, using the bivariate normal distributions, the hypothesis *H* _{0}: *a* _{j}=0 and *a* _{k}=0 can be tested for calculating the power of ML interval mapping.

When more, say *p*, QTLs are considered in the REG interval mapping model, the information matrix of parameters is *I*(*a*)=(*W*′*W*)/σ^{2}. It can shown that *I*(*a*)/*n*~*V*(*W*)/σ^{2}. As *V*(*W*) is invertible, we can express *V* ^{−1}(*W*)=VIF ×*A* _{0}, where *A* _{0}=[*a* _{ij}]_{p×p}. For ML interval mapping, the information matrix can be obtained by using the general formulae of Kao & Zeng (1997) . Similarly, when sample size grows large, *I*(*a*)/*n* can be expressed as *V*(*X**)/σ^{2} (*X** denotes the matrix whose *i*, *j*th entry is *x* _{ij}*), whose diagonal elements are the expected *I*(*a* _{j})'s, *j*=1, 2, …, *p* and off-diagonal elements are expected *I*(*a* _{j}, *a* _{k})'s. The *V*(*X**) matrix is also invertible and can be formulated as *V* ^{−1}(*X**)=VIF×*A* _{0}. For both REG and ML interval mapping, we can define σ_{j}^{2}=VIF×*a* _{jj}×σ^{2}/*n*, where *a* _{jj}, *j*=1, 2, …, *p* denote the diagonal elements in *A* _{0}. Then, we can construct the standardized estimated effects as *t* _{j}=(*â* _{j}−*a* _{j})/σ_{j}, *j*=1, 2, …, *p*, and (*t* _{1}, *t* _{2}, …, *t* _{p})′ follows a *p*-variate normal distribution. Given specified critical values, the probability of significance can be calculated (Genz & Bretz, 2009) to evaluate the power of separating more linked QTLs.

### Genetic parameters and residual variances

Further, we know that the relationship between environmental variance, σ^{2}, and genetic variance, *V* _{G}, can be formulated as , where *h* ^{2} is the heritability of quantitative trait variation. The genetic variance can be decomposed into components of genotypic frequencies and QTL effects. For two QTLs with additive effects only, *V* _{G}=var(*x* _{ij})×*a* _{j}^{2}+var(*x* _{ik})×*a* _{k}^{2}+2×cov(*x* _{ij}, *x* _{ik})×*a* _{j}×*a* _{k}, where *x* _{ij} and *x* _{ik} denote the coded variables of the two fully observed QTLs (see Kao & Zeng, 2009 for the components of *V* _{G} with complete effects and contributed by more QTLs). As var(*x* _{ij})=var(*x* _{ik})=2(*C*+*D*+*E*) and cov(*x* _{ij}, *x* _{ik})=2(*C−D*) depend on the genotypic distribution of experimental populations, given specific QTL effects, *V* _{G} is population dependent. For example, *V*(*x* _{ij})=1/2 and cov(*x* _{ij}, *x* _{ik})=(1−2*r* _{t})/2 in AI *F* _{t} populations, and 1/2<*V*(*x* _{ij})<1 and (1−2*r*)/2<cov(*x* _{ij}, *x* _{ik})<(1−2*r*)/(1+2*r*) in RI *F* _{t} populations. Therefore, a more detailed formulation of *V* _{G} can be also expressed as

Xu (1995) pointed out that the residual variance in REG interval mapping inflates, due to the uncertainty of the QTL genotype, and that the amount of inflation parameter is about [var(*x* _{ij})−var(*w* _{ij})]×*a* _{j}^{2} in a single QTL model. For a multiple QTL model, the amount is about ignoring covariance parts. If the event of double recombinations in the interval is negligible, this amount can be expressed as 4*p*(1−*p*)(*E*+2*D*)×*a* _{j}^{2} in a single QTL model. For *p* QTL, *Q* _{j}, in *p* distinct intervals, (*M* _{j}, *N* _{j}), *j*=1, 2, …, *p*, the amount of inflation is about , where *p* _{j}=*r* _{1j}/*r* _{j} (*r* _{j} and *r* _{1j} are the recombination fractions between (*M* _{j}, *N* _{j}) and between (*M* _{j}, *Q* _{j}), and *E* _{j} (*D* _{j}) is the frequency of *M* _{j}*N* _{j}/*M* _{j}*n* _{j} (*M* _{j}*n* _{j}/*M* _{j}*n* _{j}) in the population. There is no inflation if QTLs are completely observed (coincident with markers). Therefore, when QTLs are located at intervals and inferred from flanking markers, the inflation of residual variance reduces the QTL detection power as compared to the power of detecting completely observed QTL (see also section 5).

The above analyses decompose equation (12) into components related to sample size, QTL effects, distance between genes, interval size and genotypic distribution of a population. They pave the way to predict and analyse the power of separation under these factors, across populations and using different methods, and to conduct the QTL analysis when QTLs are completely observed (coincident with markers) or not observable (located in the markers intervals). The validity of proposed formulae in predicting the power of separating linked QTLs is first checked by Monte Carlo simulations, and then the formulae are applied to the power analysis under several mapping factors in different populations.

## Numerical analysis

On the basis of our proposed formulae, numerical analyses of the power of dissecting closely linked QTLs under various mapping factors and in different experimental populations are shown in Figs 2(*a*–*d*). The factors considered are sample size, QTL effect, interval size and distance between QTLs, and the populations considered include the *F* _{2}, AI and RI. Also, both REG and ML interval mapping are applied to the power analysis. In all the cases, we assume *h* ^{2}=0·2. Figure 2(a) shows the power curves of separating two QTLs located in 10 or 20 cM spaced marker intervals under different distances. The order considered is *M* _{j}-*Q* _{j}-*N* _{j}–*M* _{k}-*Q* _{k}-*N* _{k}, and both QTL are located right in the middle of their intervals (, , and in the case of the 10 cM intervals, and , , and in the case of the 20 cM intervals). The distances between QTLs are 20, 25, 30, 35, 40, 45 and 50 cM, respectively (, 15, 20, 25 and 30 cM in the 10 cM intervals, and , 5, 10, 15 and 20 cM in the 20 cM intervals). The two QTLs have equal effects and the sample size is 200. It shows that, given a distance between QTLs, the powers of separation are larger when they are in the narrow intervals. Also, the powers by ML interval mapping is higher than those by REG interval mapping. As mentioned earlier, separating linked QTLs is the most difficult for the case of *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k} (*M* _{j}-*Q* _{j}-*N* _{k}-*Q* _{k}-*N* _{k}) order, because they share a common flanking marker. Figures 1 *b*–*d* present the powers of separating two 10-cM-apart QTLs in the *F* _{2}, AI and RI populations for this order. Assume that *d* _{Mj}Q_{j}=5 cM, *d* _{Qj}N_{j}=5 cM, *d* _{Nj}Q_{k}=5 cM and *d* _{Qk}N_{k}=5 cM, and that the QTLs have equal effects. In Fig. 2 *b*, with 500 sample size, the powers of REG and ML interval mapping are very low (close to zeros) in the *F* _{2} and *F* _{3} populations. But, the powers increase in the more advanced populations. The powers increase to 0·238 and 0·670 using REG interval mapping in AI *F* _{6} and RI *F* _{6} populations, and they increase to 0·367 and 0·741, respectively, using the ML method. Figure 2 *b* also presents the powers of separation when *Q* _{j} and *Q* _{k} are completely observed (and fitted into the model). As expected, the powers are greater when they are completely observed (the curves with solid and empty triangles). For example, the power is 0·427 in *F* _{2}, and it becomes 0·732 and 0·925 in AI *F* _{3} and RI *F* _{3} populations, respectively. The powers gradually attain more than 0·99 for more advanced populations. Figure 2 *c* shows the powers of separating two fully observed linked QTLs under different sample sizes. The QTLs have equal effects. The powers are about 0, 0·001, 0·037, 0·198 and 0·427 for *n*=100, 300 and 500, respectively, in the *F* _{2} population, and are 0·059 (0·032), 0·194 (0·518), 0·565 (0·862), 0·815 (0·968) and 0·930 (0·994), respectively, in the AI *F* _{5} (RI *F* _{5}) populations. This shows that advanced populations can be much more efficient, and that the RI populations can be more powerful than the AI populations in separation. Figure 2 *d* illustrates the relations between power and sample size when separating 10-cM-apart QTL with different sizes in the *F* _{2} population. The QTLs are assumed to be completely observed. The powers of separating QTLs with similar size (e.g. *a* _{j}:*a* _{k}=1:1) are higher than those of separating QTLs with different size (e.g. *a* _{j}:*a* _{k}=2:1), and that the powers for separating QTLs with different direction of effects (e.g. *a* _{j}:*a* _{k}=1:−1) is much higher than those with the same direction of effects (e.g. *a* _{j}:*a* _{k}=1:1). For example, the powers are 0·236, 0·344, 0·427, 0·981 and 1·000 (0·298) for the effect ratio 1:2, 1:1·5, 1:1, 1:−1·5 and 1:−1 with *n*=500, respectively. In general, an effective separation of closely linked QTLs requires large *n*, high *h* ^{2}, and small ρ and more QTL information in a population.

Fig. 2. (a) Power curves of separating two linked QTLs located in the middle of the 10- or 20-cM-spaced marker intervals under various distances in the *F* _{2} population. The order considered is *M* _{j}-*Q* _{j}-*N* _{j}–*M* _{k}-*Q* _{k}-*N* _{k}. The distances between QTLs are 20, 25, 30, 35, 40, 45 and 50 cM, respectively. The two QTLs have equal effects, and *n*=200. (b) Power curves of separating two 10-cM-apart QTLs when QTLs are coincident with markers (MR) or located in the intervals (REG and ML) in the AI and RI populations. QTLs have equal effects and *n*=500. The order considered is *M* _{j}-*Q* _{j}-*N* _{j}-*Q* _{k}-*N* _{k}. (c) Power curves of separating two 10-cM-apart QTLs under different sample sizes in the AI and RI populations. QTLs have equal effects and are located at markers. (d) Power curves of separating two 10-cM-apart QTLs with different sizes of effects under different sample sizes in the *F* _{2} population. QTLs are assumed to be located at markers. In all cases, *h* ^{2}=0·2. α=0·005 is chosen as the significant level.

## Discussion

QTL mapping is a key approach to the understanding and estimation of the genetic architectures of quantitative traits in quantitative genetics (Zeng *et al.*, 1999). In QTL mapping, when QTLs are tightly linked, the estimation of QTL parameters could be easily biased, and the power of detection could be reduced. Therefore, the study of detecting and separating the linked QTLs correctly and efficiently remains an important issue in QTL mapping (Lander & Botstein, 1989, Ronin *et al.*, 1999; Hu & Xu, 2008). We tackle this issue by developing test statistics to test the effects of QTLs located at the markers or in the intervals. Both the REG and ML interval mapping models are considered. By well characterizing the genotypic distributions of three and four genes, we are able to evaluate the variances and covariances of the predictor variables of QTL in the models, and then to construct test statistics for detecting linked QTLs under more wide-ranging situations. Our proposed test statistics are simple functions of information numbers, VIF and genetic parameters in the models in the populations. They allow us to predict the power of separating linked QTLs under different mapping factors and across different populations. The direct application of our approach to QTL mapping requires the intervals potentially localizing QTL are known for testing. However, those potential intervals are not known before implementing the preliminary analysis. To identify the potential intervals, the use of multi-dimensional search, such as screening all pairs of close intervals, along the whole genomes may not be appropriate, as it can be subjected to a substantial computational burden. In practice, one suggestion is to first use one-QTL model analysis (one-dimensional search) to identify the regions containing potential intervals. In the likelihood profiles of the one-dimensional search, the regions showing significant sign changes in the estimated QTL effects or showing wide and significant peaks (ghost QTL) may indicate containing potential intervals (Haley & Knott, 1992; Kao *et al*., 1999; Zeng *et al.*, 1999). Then our approach can be applied to these potential intervals for further analysis of closely linked QTLs.

The different advanced populations have different population structures, such as homozygosities, linkage disequilibria (correlations between genes) and genotypic frequencies (Weir, 1996). Therefore, they will show different properties in the resolution of closely linked QTLs. When QTLs are linked, their correlation can be generally formulated as 1−2*R*, where *R* is the proportion of recombinants in a population. In a population, the closer they are linked, the less recombinants are produced and the stronger the correlation is. Fitting linked QTLs is equivalent to fitting correlated variables into the model, which cause the problems of collinearity in statistical estimation. Consequently, the separation becomes more difficult for closer QTLs as the collinearity problem becomes more severe. The obvious way to relieve the collinearity problem is to increase the proportion of recombinant in a population. In the BC or *F* _{2} populations, the proportion of recombinants is equivalent to the recombination fraction between QTLs (*R*=*r*). In the AIL and RIL populations, more recombinants can be produced and accumulated, so that *R>r* as generation proceeds. Then, these advanced populations would provide smaller VIF and reduce correlations for the QTL parameters to facilitate QTL detection. Nevertheless, we should know that the sizes of marker intervals localizing QTLs may expand (relative to that in the backcross or *F* _{2} population) in the more advanced AI populations (Lynch & Walsh, 1998; Kao & Zeng, 2009) so that the benefit may be offset. Greatly improving separation in the AI populations requires denser markers around the detected QTL (QTL located in narrow intervals), and the improvement would be limited if QTLs are in the sparse marker region (wide intervals). The more powerful separation in the later RI population is also due to the increase of additive variances (accumulation of homozygotes). For example, the additive variance of a QTL in RIL can be twice of that in the *F* _{2} population, and the power of separation can be much higher in the RIL (see Fig. 2 *b*, *c*). By well utilizing the properties of genome structures in the later advanced populations, it is possible to improve the resolution of closely linked QTLs in QTL detection.

Given a distance between QTLs, the powers of separating QTL at the markers are greater than those in the intervals (Fig. 2 *b*). To detect QTLs located in the intervals, the REG or ML interval mapping models have been very popular and used in the separation. In either one of the two statistical models, when the flanking marker intervals become wider or the locations of QTLs are closer to the middle of the intervals, the variances of predictor variables become smaller and their correlations become larger (not shown). Consequently, their detection would be more difficult (Fig. 2 *a*). Our proposed formulase can take the parameters of QTLs positions and effects and the population structures together into account to predict the power of separation. In general, given a distance between QTL, separation can be more effective for QTLs of similar size, located closer to markers and in narrow intervals, with opposite direction effects, and contributing to a high proportion of trait variation. Also, it is possible to gain more power in QTL detection by utilizing more advanced populations. The results may facilitate the analysis of QTL resolution in the genetic study of quantitative traits.