Skip to main content Accessibility help
×
Home

Information:

  • Access
  • Cited by 4

Actions:

      • Send article to Kindle

        To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines
        Available formats
        ×

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines
        Available formats
        ×

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines
        Available formats
        ×
Export citation

Summary

In genetic and biological studies, the F2 population is one of the most popular and commonly used experimental populations mainly because it can be readily produced and its genome structure possesses several niceties that allow for productive investigation. These niceties include the equivalence between the proportion of recombinants and recombination rates, the capability of providing a complete set of three genotypes for every locus and an analytically attractive first-order Markovian property. Recently, there has been growing interest in using the progeny populations from F2 (advanced populations) because their genomes can be managed to meet specific purposes or can be used to enhance investigative studies. These advanced populations include recombinant inbred populations, advanced intercrossed populations, intermated recombinant inbred populations and immortalized F2 populations. Due to an increased number of meiosis cycles, the genomes of these advanced populations no longer possess the Markovian property and are relatively more complicated and different from the F2 genomes. Although issues related to quantitative trait locus (QTL) mapping using advanced populations have been well documented, still these advanced populations are often investigated in a manner similar to the way F2 populations are studied using a first-order Markovian assumption. Therefore, more efforts are needed to address the complexities of these advanced populations in more details. In this article, we attempt to tackle these issues by first modifying current methods developed under this Markovian assumption to propose an ad hoc method (the Markovian method) and explore its possible problems. We then consider the specific genome structures present in the advanced populations without invoking this assumption to propose a more adequate method (the non-Markovian method) for QTL mapping. Further, some QTL mapping properties related to the confounding problems that result from ignoring epistasis and to mapping closely linked QTL are derived and investigated across the different populations. Simulations show that the non-Markovian method outperforms the Markovian method, especially in the advanced populations subject to selfing. The results presented here may give some clues to the use of advanced populations for more powerful and precise QTL mapping.

1. Introduction

Many quantitative trait loci (QTLs) detection experiments and statistical QTL mapping methods are conducted and developed on the basis of the backcross and F2 populations. These two populations are popular mainly for economic reasons as they can be readily generated for use in experiments, thus saving time and money. Further, due to the fact that these populations undergo just a single cycle of meiosis, they have several significant features that make them attractive for general purpose genetic and biological studies (Lander & Botstein, 1989; Jansen, 1993; Zeng, 1994; Jiang & Zeng, 1997; Kao et al., 1999; Xu, 2007). For example, the recombination rate between different loci is equivalent to the proportion of the recombinants and their genomes have a first-order Markovian property in the two populations. Also, the progeny populations after F2 (advanced populations) have been well devised and implemented in genetic studies. These advanced populations include recombinant inbred (RI) populations, advanced intercrossed (AI) populations, intermated recombinant inbred (IRI) populations and immortalized F2 populations. For a review of these advanced populations, see e.g. Rockman & Kruglyak (2008).

These advanced populations have some very useful features in that their genomic structures allow investigators to achieve better performance in their studies. For example, the RI populations consist of nearly fixed genomes for multiple phenotyping and contain a specific genotype to increase the accuracy of assessment in studying quantitative traits (Lander & Botstein, 1989). Further, the AI populations can harbour more recombination events in a short chromosome segment for genetic fine mapping (Darvasi, 1998). Also, the IRI and RIX (recombinant inbred intercrosses) populations can be managed to have both the advantages of RI and AI populations (Liu et al., 1996; Hua et al., 2002; Winkler et al., 2003; Zou et al., 2005).

The derivation of the RI populations or AI populations is obtained by recurrently selfing (inbreeding) or randomly intermating the F2 individuals for several generations. The IRI populations are derived by first producing AI populations, followed by repeated selfing. The immortalized F2 populations are obtained by first producing RI populations, followed by a generation of random mating. As a generation advances beyond F2, either by further selfing or intermating, the advanced populations must undergo multiple cycles of meiosis, so that the crossovers will accumulate and the proportions of recombinants will increase in the populations (Haldane & Waddington, 1931; Liu et al., 1996; Darvasi, 1998; Winkler et al., 2003). In the literature, it has been noted that the proportion of recombinants in RI populations can be twice that in the F2 populations for closely linked loci, and that linkage is broken down even more rapidly by random intercrossing in the AI populations (Haldane & Wanddington, 1931; Darvasi, 1998). The increased number of recombinants provided by the advanced populations facilitates the construction of high-resolution genetic maps and detection of closely linked QTLs (Liu et al., 1996; Darvasi, 1998). Further, cycles of inbreeding and/or random mating in a population will shape differences in the population genomic structures such as the homozygosity, genotypic frequencies and variance components (Weir, 1996). As such, different advanced populations produce different genomic structures to be used for different breeding and study purposes (Liu et al., 1996; Hua et al., 2002; Winkler et al., 2003; Broman, 2005).

When using these advanced populations for QTL mapping, it should be noted that their genome structures no longer have a first-order Markovian property and have different genomic constitutions from that of the F2 populations (Jiang & Zeng, 1997). So far, most of the current QTL mapping methods and related mapping properties are developed and investigated for the genomes of backcross and F2 populations with the Markovian property (Lander & Botstein, 1989; Jansen, 1993; Churchill & Doerge, 1994; Zeng, 1994; Kao et al., 1999; Kao & Zeng, 2002; Kao, 2004; Xu, 2007). Although issues related to using advanced populations in QTL mapping have been raised (Jiang & Zeng, 1997; Darvasi, 1998; Martin & Hospital, 2006), they are still investigated by invoking this Markovian assumption. It is therefore desirable to consider the specific structures of these advanced populations for QTL detection, so that their advantages can be utilized to enhance QTL resolution. In this paper, detailed analyses and discussions related to these advanced populations will be given. When samples are drawn from the advanced populations, statistical methods are developed by considering and ignoring their specific population genome structures (without and with a first-order Markovian assumption) and are compared for use with the multiple-QTL model for use in QTL mapping studies. In addition, the QTL mapping properties across different advanced populations are derived and discussed. Simulation studies are performed for purposes of evaluation and comparison. The results show that the proposed methods can improve the resolution of the genetic architecture of quantitative traits and serve as a tool for studying QTL mapping in various advanced populations derived from two inbred lines.

2. The genome structures of advanced populations

We refer an AI (RI) Ft population as an AI (RI) population from intercrossing (selfing) the F2 individuals for t−2, t>3, generations. An IRI Fi:j population is referred to as a population produced by first randomly intercrossing the F2 individuals for i−2 generations, followed by j, j⩾1, cycles of selfing, and an IF2 population denotes an immortalized F2 population.

(i) Genome structure

In an F2 population, the genotypic frequencies of P 1 homozygote, heterozygote and P 2 homozygote are 1/4, 1/2 and 1/4, respectively, for one locus, and the heterozygosity H t is 0·5. The genotypic distribution for any two pairwise loci, say A and B, is also well known and characterized (see, for example, Kao & Zeng, 1997), and it has a simple relationship with the recombination rate between them (r). For example, the genotypic frequency of genotype AB/AB is (1−2r)2/4, and the other nine genotypic frequencies also have similar simple relationships with r (see, for example, Table 2 in Kao & Zeng, 1997). Also, the proportion of recombinants (R) between A and B is equivalent to the recombination rate, i.e. R=r, and the linkage parameter between A and B can be found to be λ=1−2r in the population. Besides, a very important and nice feature for the F2 population is that the F2 genomes have a first-order Markovian structure under the Haldane map function. This allows that the distribution of the multiple genes can be obtained from the distributions of pairwise genes. For example, the probability distribution of three ordered genes, A, B and C, can be derived from the probability distributions of first pairwise genes, A and B, and the second pairwise genes, B and C, i.e. P(ABC)=P(AB)×P(BC).

The heterozygosity for one locus in the RI Ft and IRI Fi:j populations are and , which is decreasing with t and j increasing, as selfing will increase the homozygotes at the expense of heterozygotes, and it is expected to be H t=0·5 in the AI F t and IF2 population (RIX) populations for any t due to random mating. Also, during the process of further meiosis, crossovers will accumulate so that the proportion of recombinants will be increasing and becoming larger than the recombination rate (R>r), and the linkage disequilibrium coefficient will decrease. To generally formulate these genetic parameters, we adopt the notations in Haldane & Waddington (1931) to define C as the frequencies of AB/AB and ab/ab genotypes, D as the frequencies of Ab/Ab and aB/aB genotypes, E as the frequencies of AB/Ab, AB/aB, Ab/ab and aB/ab genotypes, F as the frequency of AB/ab genotype and G as the frequency of Ab/aB genoytype, respectively, for any two loci A and B, and they in terms of C, D, E, F, and G are

respectively, in any advanced population. In the F2 population, the frequencies C, D, E, F, G in terms of r are C=(1−r)2/4, D=(1−2r)/4, E=r(1−r)/2, F=(1−r)2/2 and G=r 2/2, which have simple relations with r, and H=1/2, R=r and D AB=(1−2r)/4. In advanced populations, these values in terms of r become relatively complicated and will vary with different t, i and j, and they can be obtained without difficulty (Jennings, 1916; Robbins, 1918; Haldane & Waddington, 1931; Winkler et al., 2003). The more important and challenging parts in this QTL mapping context under the framework of interval mapping procedure are to characterize the genotypic distributions of three loci for various advanced populations, whose genomes do not have a first-order Markovian property.

3. Methods

(i) Data structure

Consider a sample of size n from an advanced population, such as AI, RI, IRI or IF2 population, derived from two inbred lines. The n individuals are genotyped for markers (X i, i=1, 2, …, n) and phenotyped for traits (y i's, i=1, 2, …, n). When such a sample is used to detect QTL, two approaches under the framework of the interval mapping procedure are proposed here. The approach developed under the Markovian assumption will be hereinafter called the Markovian method, and the approach developed without the Markovian assumption will be hereinafter referred to as the non-Markovian method.

(ii) Genetic model and variance components

Consider that a trait is controlled by m QTLs, Q1, Q2, …, Qm, and there are 3m possible QTL genotypes. For any individual i, its QTL genotype belongs to one of the 3m genotypes, and the corresponding genotypic values, G i's, can be expressed as

(1)

where μ is the intercept, a j and d j are the additive and dominance effects of Qj, j=1, 2, …, m, and (i aa)jk, (iad)jk, (ida)jk and (idd)jk are additive×additive, additive×dominance, dominance×additive, and dominance×dominance interaction effects between Qj and Qk. The variables, x ij* and Z ij*, associated with a j and d j are coded as (1,−1/2), (0,1/2) and (−1,−1/2) for genotypes Q jQj, Q jqj and q jqj, respectively, according to Cockerham's model (Kao & Zeng, 2002). Under the genetic model (1), the genetic variances of a quantitative trait can be generally decomposed into 2m 2 variances and 2m 4m 2 covariances. In practice, the variance component structure will be simpler in the advanced populations as some covariances vanish due to equal frequencies of the two alleles at any locus. Taking m=2 as an example, the genetic variance components are

(2)

The component structures allow us to investigate some QTL mapping properties. For example, the additive (dominance) variances are found to increase (decrease) in the RI or IRI population, showing that these populations may facilitate (hinder) the estimation of the additive (dominance) effects (Kao, 2006). Also, the possible confounding problems in QTL estimation may be identified from the covariances between genetic effects (Kao & Zeng, 2002; Kao, 2006). If the two-locus model is expressed as a model of 15 parameters to distinguish each allelic effect, the genetic variance becomes even more complicated (Weir & Cockerham, 1977).

(iii) Markovian and non-Markovian methods

With the genetic model in eqn (1), the statistical model to relate a quantitative trait value, y, to the genotypic value, G, contributed from the m QTLs at positions, p1, p2, …, and pm can be written as

(3)

where εi is the environmental deviation and assumed to follow normal distribution with mean zero and variance σ2. In QTL mapping, the QTLs are usually assumed be located in the intervals and need to be estimated, so that the 3m genotypes, (x ij* and z ij*), may not be observed, and the model becomes a normal mixture model. For n individuals, the likelihood function for θ can be generally expressed as

(4)

where the mixing proportions, p ij's, j=1, 2, …, 3m, are the conditional probabilities of the putative QTL genotypes given marker genotypes, and μj's, j=1, 2, …, 3m, correspond to the genotypic values of the 3m different QTL genotypes. Using the interval mapping procedure (Lander & Botstein, 1989), the conditional probabilities can be predetermined by successively and jointly using the flanking markers of the putative QTL; hence they need not to be estimated. The parameters θ involved in the statistical estimation of the normal mixture model are μ, σ2, a i's, d i's, i aa's, i ad's, i da's and i dd's. Especially, it should be pointed out that the derivation of the conditional probabilities for each putative QTL using its flanking markers is not straightforward in the advanced populations as has been done for the F2 and backcross populations (see below). When m putative QTLs are considered at a time, the joint conditional probability is approximated by the product of m individual conditional probabilities. In the following, we propose two QTL mapping methods for the advanced populations under eqn (3). The one using the conditional probabilities derived from a first-order Markovian assumption as the mixing proportions will be called the Markovian method hereafter, and the other using the conditional probabilities obtained without this assumption (by using the proposed transition equations) as mixing proportions will be called the non-Markovian method hereafter.

(iv) Conditional probabilities of the putative QTL genotypes

The interval mapping approach intends to compute the conditional probabilities of a putative QTL by using the information from its two flanking markers. Set M with alleles M and m, Q with alleles Q and q and N with alleles N and n, where Q is the putative QTL, and M and N are the flanking markers, and assume that r, r 1 and r 2 are the recombination rates between M and N, between M and Q and between Q and N. To derive the conditional probability of the QTL genotype within the flanking marker genotype, P(Q | M, N)=P(MQN)/P(MN), for a population, both the genotypic distributions of two and three genes under generations of selfing or/and random mating are needed. The genotypic distribution of two genes, P(MN), under random mating and self has been very well known (Jennings, 1916; Robbins, 1918; Haldane & Waddington, 1931). For the F2 population, the derivation of the genotypic distribution for three genes, P(MQN), is simple and can be obtained by using the probabilities of two adjacent pairwise genes, P(MQ) and P(QN), as its genomes have a first-order Markovian property. That is, P(MQN)=P(M)P(Q | M)P(N | Q, M)=P(M)P(Q | M)P(N | Q), as P(N | Q, M)=P(N | Q). However, for advanced populations, this Markovian property disappears so that the genotypic distribution of three genes cannot be obtained directly from the distributions of two genes, i.e. by simply replacing the recombination rates (r 1, r 2 and r) by frequencies of recombinants (R 1, R 2 and R) as suggested by Jiang & Zeng (1997) and Lynch & Walsh (1998). For example, it is suggested to approximate the two conditional gametic frequencies by Pr(Mqn | Mn)≈R 1(1−R 2)/R and Pr(MQn | Mn)≈(1−R 1)R 2/R in an advanced population. Such a replacing implicitly assumes that the genomes of the advanced populations still have a first-order Markovian property and, therefore, the obtained frequencies are approximate. Another obvious yet often unnoticed problem for this replacing is that the sum of the approximate probabilities may not be equal to one as the Haldane map function does not hold for the R (RR 1+R 2−2R 1R 2) in the advanced populations. Appropriate correction is needed when using these approximate probabilities. In this article, correction will be made by dividing the approximate probabilities by their sum. The derivation of the exact genotypic distribution for three genes needs more delicate considerations as provided below.

The derivation of the genotypic frequencies of three genes for the advanced populations needs to consider two different types of mating systems: random mating and selfing. When mating is random, the frequency of a zygotic genotype is the product of two gametic frequencies in the previous population, and the focus is on deriving the transition equations for the frequencies of eight different gametic types from generation to generation. For example, in AI Ft, the probability of MQN (mqn) gamete, P 1,t, can be generally obtained as

(5)

where P 2,t–1 is the frequency of MqN (mQn) gamete, P 3,t–1 is the frequency of MQn (mqN) gamete, and P 4,t–1 is the frequency of mQN (Mqn) gamete in the previous population. An alternative iteration equation for P 1,t can be derived by using Geiringer's formulation (1944). If the population is self-fertilized, the gametes of an individual are randomly mating within the individual and are not allowed to seminate the gametes from different individuals, and the focus is on deriving the transition equations for the frequencies of 36 different zygotes from generation to generation. For example, in RI Ft population, the probability of zygote is

(6)

Similarly, the other transition equations for the three gamete frequencies under random mating and for the 35 zygote frequencies under selfing can be obtained (see Supplementary material). By jointly using these transition equations, it is sufficient to obtain the gamete or genotypic frequencies to calculate all conditional probabilities for various fixed and unfixed advanced populations subject to different cycles of random mating and/or self. Teuscher & Broman (2007) developed an alternative technique by solving a set of linear equations to obtain the unknown tri-genic haplotype (gametic) probabilities for fixed RIL populations.

The differences between the conditional probabilities of QTL genotypes given marker genotypes obtained with and without a first-order Markovian assumption can be very significant and in turn can have a substantial impact on QTL mapping (see below). Numerical investigation of their differences for QQ, Qq and qq genotypes given the marker genotype MN/MN for the case of r 1=r 2=0·1 in AI F t, RI F t, IRI F 10,t and RIX F 10,t populations is shown in Figs 1 ad for illustration. For AI F t populations, the differences are generally very minor (the differences are within ~0·01; see Figure 1(a)). All three curves are below zero, implying that the probabilities of QTL genotypes are underestimated by the Markovian assumption. The differences between the conditional probabilities become more significant (between ~−0·06 and 0·07; see Figure 1(b)) in RI F t populations as compared with those in the AI F t populations. Such differences are increasing at the first few generations of selfing and become stable on proceeding further. For IRI F 10,t populations, the differences are very significant (between ~−0·2 and 0·4) and increase as the selfing cycle increases. For RIX F 10,t populations, the differences are greatly reduced by intercrossing. In general, persistent selfing tends to enlarge their differences, and continuous intercrossing eventually mitigates their differences. The method with the Markovian assumption also overestimates the frequency of Qq and underestimates the other two frequencies during selfing. The sums of the three conditional probabilities are about 0·962–0·980, 0·977–0·995, 0·976–0·991 and 0·964–0·980, respectively, in the RI, AI, IRI and RIX populations. Figures 2 ad show the numerical differences in conditional probability for QQ, Qq and qq genotypes given the marker genotype MN/Mn. More significant differences are observed in the Mn/Mn class, and the sum of the conditional probabilities may be up to 1·125 (not shown). Therefore, it is important to compute the correct conditional probabilities of the putative QTL genotypes, as they serve as the mixing proportions of the normal mixture model in QTL mapping. The problem of using incorrect (approximate) conditional probabilities of QTL genotypes includes the loss of power and precision in QTL detection as mentioned by Martin & Hospital (2006) and shown in this paper (see the Simulation study section).

Fig. 1. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/MN obtained by using the Markovian and non-Markovian methods for the case of r 1=0·1 and r 2=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F10,t populations. (d) RIX F10,t populations.

Fig. 2. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/Mn obtained by using the Markovian and non-Markovian methods for the case of r 1=0·1 and r 2=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F10,t populations. (d) RIX F10,t populations.

(v) Maximum likelihood estimation

In parameter estimation, it is straightforward to treat the normal mixture model in eqn (4) as an incomplete-data problem by regarding the trait, Y, and markers, X, as observed data and the putative QTLs, x ij*'s and z ij*'s, as missing data, then the EM algorithm (Dempster et al., 1977) can be readily implemented to obtain their maximum likelihood estimates (MLEs). Alternatively, the marker genotypes and the unknown QTL genotypes can be treated as the observed state and hidden state in the set-up of the hidden Markov model (HMM; Koski, 2001) under the Markovian assumption along the genome. The EM algorithm is an iterated procedure and, in each iteration, it consists of an expectation step (E-step), followed by a maximization step (M-step). When applying the EM algorithm, the general formulae devised by Kao & Zeng (1997) can be implemented to obtain the MLE applied here. The E-step is to compute the posterior probabilities of 3m QTL genotypes. In M-step, the coded variables associated with the m QTLs in all the 3m possible genotypic values are assigned to the elements of genetic design matrix. The E- and M-steps are iterated until convergence, and the converged values are the MLEs.

(vi) QTL mapping properties

To investigate and explore QTL mapping properties across populations, without loss of generality, assume that the quantitative trait is affected by the two linked epistatic QTLs, QA and QB, with complete effects. We consider the scenarios of using QA only and of using both QA and QB in the quantitative trait analysis. If the quantitative trait is regressed on QA only, the regression coefficient for the additive effect of QA is

(7)

in an advanced population. Similarly, the regression coefficient for the dominance effect of QA, d A, and the partial regression coefficient for the additive (dominance) effect of QA given the additive (dominance) effect of QB, a A.Ba (d A.Ba ), can be derived and their components are shown in Table 1. By analysing the coefficients, it is possible to decompose the regression coefficient into components and to trace the changes of these components for identifying the confounding problems as the population advances. Taking Eqn (7) as an example, under selfing, the coefficient associated with a 2 (i da) is positive (negative) and decreasing (increasing) from 1−2r(−(1−2r)/2) to , and the coefficient associated with i ad is negative and decreasing from −(1−2r)2/2 to −1/2, as generation proceeds (t increases). For t→∞ under self, . If mating is random, the coefficient can be generally expressed as . The coefficients associated with a 2, i ad and i da approach to zero as t→∞. Such analyses make it possible to clearly identify how the different genotypes and effects play a role in the confounding problem across populations. In general, the confounding problem generally becomes less severe as the generation proceeds under random mating. Under selfing, the confounding of i ad becomes more severe and the confounding of i da becomes less severe in the estimation of additive effects of QA as generation proceeds. The confounding of the i dd becomes more severe, and i aa will be always confounded in the estimation of the dominance effects as generation proceeds by selfing.

Table 1. The components of the regression coefficient and partial regression coefficient

Assume that the quantitative trait is controlled by two QTLs, QA and QB. a 1 and d 1 (a 2 and d 2) are the additive and dominance effects of QA (QB). i aa, i ad, i da and i dd are their epistatic effects.

a A (d A) is the regression coefficient for the additive (dominance) effect of QA, and () is the partial regression coefficient for the additive (dominance) effect of QA given the additive (dominance) effect of QB.

(vii) Power of separating closely linked QTL

To simplify the discussion, we first consider that two linked QTLs with additive effects, a 1 and a 2, only are located at known markers; then the QTL mapping model in eqn (3) reduces to a regression model fitting two correlated variables, x i1* and x i2*. As derived above, the correlation between x i1* and x i2* is equivalent to the linkage parameter between the two QTLs, λ=(CD)/(C+D+E), which can be interpreted as a measure of the difference between the recombinant (D) and non-recombinant proportions (C) in a population. We can expect that the linkage parameters will decrease for farther genes or in later populations as there are more recombinants and less non-recombinants in either case. In a statistical modelling, fitting correlated variables into the model will raise the problems of collinearity, e.g. inflated variances of â 1 and â 2, in estimation and testing (Marquardt, 1970), leading to the difficulty in obtaining simultaneously significant tests for QTL effects (successful separation of linked QTLs). For example, in the AI F t population (under the process of random mating), C+D+E=1/4 and CD=(1–2r′)/4, where r′=[1−(1−2r)(1−r)t−2]/2, so that λ=1−2r′ is decreasing with t, and the decreasing rate of λ is 1−r for each generation of random mating. Under self, λ is also decreasing, but with a much lower rate. In RIL, λ=(1−2r)/(1+2r), which is smaller than (1−2r) in the F2. In general, the linkage parameter is decreasing and the collinearity problem can be eased in the advanced population. As a consequence, the separation of closely linked QTLs can be more powerful by using the sample from the advanced population, especially from the population subject to several cycles of random mating.

4. Simulation studies

Simulations were conducted to evaluate the performances of the non-Markovian and Markovian methods, to validate the derived mapping properties and to compare relative efficiencies of using different advanced populations in QTL mapping. A large set of fixed and unfixed populations, including RI, AI, IRI and IF2 populations, was simulated as they are very popular in biological studies (Lee et al., 2002; Rockman & Kruglyak, 2008). For RI and AI populations, F 3, F 4, F 5 and F 10 populations were simulated. For IRI and RIX populations, IRI F 5:1, F 5:3 and IF2 populations were simulated. For each population, two linked epistatic QTLs, QA ans QB, with complete effects a 1=2, d 1=2, a 2=2, d 2=2, i aa=2, i da=2 and i dd=2 are considered, and the heritability is assumed to be 0·05 (defined in the F2 population under the Cockerham model by Kao & Zeng, 2002). With such parameter settings, the total genetic variance and environmental variance are 6·32 and 120·88, respectively, and the genetic variances contributed by the marginal effects and epistatic effects and genetic covariance are 3, 2·227 and −1·865, respectively. The positions of the two QTLs were assumed to be 30 cM apart and located at 25 and 55 cM along one 100 cM chromosome. Two marker maps are considered. The first map assumes 11 equally spaced markers (the sparse map hereinafter), and the second map assumes 19 markers placed at 0, 10, 15, 20, 24, 27, 30, 35, 40, 45, 50, 54, 57, 60, 65, 70, 80, 90 and 100 cM (the dense map hereinafter). The sample size is 1000 and the number of simulated replicates is 100 for each setting. The applied mapping models are all two-QTL models with different fixed numbers of effects. Except for RI F 10 population (RIL), the mapping models applied to QTL detection include the eight-effect (complete-effect) model, the five-effect model (with a 1, d 1, a 2, d 2 and i aa) and the four-effect model (with a 1, d 1, a 2 and d 2). For RIL, the three-effect model with epistasis (with a 1, a 2 and i aa) and the two-effect model without epistasis (with a 1 and a 2) are applied to the analysis as RIL has very few heterozygotes and low power to detect dominance components. These models are applied to a two-dimensional grid search on the chromosome for QTL. At the positions with maximum value of the likelihood function, we test the significance of the first (second) QTL given the second (first) QTL by testing its main and epistatic effects jointly. For example, given the second (first) QTL, the hypothesis H 0: a 1=d 1=i aa=i ad=i da=i dd=0 (H 0: a 2=d 2=i aa=i ad=i da=i dd=0) is tested for the existence of the first (second) QTL at the positions if the complete-effect model is used. Similarly, if the five-effect (four-effect) model is used, the hypothesis H0: a 1=d 1=i aa=0 (H 0: a 1=d 1=0) is tested for the existence of the first QTL given the second QTL. If both the LRT statistics are larger than the specified critical values at 5% level, a successful detection of the two QTLs (separation of the two linked QTLs) is declared at the tested positions, and the corresponding estimated effects are reported as the MLE of the effects. In QTL mapping, the issue of determining the critical value for declaring QTL detection has been very complicated, and several methods have been suggested to determine the critical value (see for a review, Zou & Zeng, 2008). Here, the critical values are evaluated using the quick method of Piepho (2001) as this method can handle a wide variety of experimental designs, such as the AI, RI, IRI and IF2 populations considered here.

The non-Markovian method obviously performs better than the Markovian method in the populations subject to self, such as RI and IRI populations. For AI and RIX populations, the two methods have similar powers, but the non-Markovian method provides more precise and accurate estimates for the positions and effects. To condense tables, only the results under the sparse map are tabulated in Tables 2–4, and those under the dense map are not tabulated, but expounded in the context. Table 2 shows the QTL mapping results under the sparse map in the RI populations. For the case of the sparse (dense) map, by applying the complete-effect model to QTL detection, the powers of separation in the RI F3, F4 and F5 populations are 0·39 (0·18), 0·23 (0·05) and 0·10 (0·11), respectively, by the non-Markovian approach, and they are 0·29 (0·19), 0·16 (0·03) and 0·04 (0·13), respectively, by the Markovian approach. The complete-effect model becomes less powerful in the later RI populations due to loss of heterozygotes. When epistasis is completely ignored by applying the four-effect model to the analysis, the powers are lower than those by the complete-effect models. The powers by the non-Markovian method are 0·09 (0·00), 0·02 (0·03) and 0·04 (0·11) for the three populations, respectively, and they are 0·10 (0·01), 0·03 (0·04) and 0·07 (0·14), respectively, by the Markovian method under the sparse (dense) map. When applying the five-effect model by considering i aa to QTL detection, the powers by the non-Markovian method are 0·55 (0·67), 0·73 (0·84) and 0·68 (0·98), respectively, and they are 0·57 (0·66), 0·74 (0·86) and 0·67 (0·99), respectively, by the Markovian method under the sparse (dense) map. In parameter estimation, for all models, the estimates of positions and effects obtained by the non-Markovian method have a better precision as compared with those by the Markovian method. For example, in the RI F4 population under the sparse map, the means of the estimated QA and QB positions for the five-effect model are 25·36 (SD 5·94) and 54·90 (SD 5·74), respectively, by the non-Markovian method, and they are 26·08 (SD 5·98) and 56·70 (SD 5·67), respectively, by the Markovian method. The five-effect model by taking i aa into account tends to be more powerful and precise than the other two models, and this model becomes more powerful in the later RI populations. For the RI F10 population (RIL), when using the three-effect model, the powers of the non-Markovian (Markovian) method are 93% (94%) and 98% (97%) in the two maps. When using the two-effect model, the powers reduce dramatically to 5% (5%) and 8% (7%), respectively. This shows that the power to detect QTL can be greatly enhanced by taking i aa into account in RIL. Confounding problems occur in the estimation of the effects if epistatic effects are not completely taken into account. For example, the means of the estimated a 1, a 2 and i aa by the non-Markovian method are 1·031 (SD 0·388), 1·032 (SD 0·402) and 1·965 (SD 0·375), respectively (the predicted values by Table 1 are 1, 1 and 2) for RIL, under the dense map. It is interesting to compare these results with those in the F2 population. The powers in the F2 population are 0·42 (0·36), 0·21 (0·43) and 0·03 (0·05) for the complete-effect, five-effect and four-effect models, respectively, under the sparse (dense) map (Table 4). The more powerful performance of using the RI populations occurs only for the five-effect model and does not occur for the other two models.

Table 2. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the RI populations

A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, QA and QB. The heritability is 0·05 in the F2 population. The critical values are determined by Piepho's method. P1 (P2): position of QA (QB). For reducing the text, standard deviations (SD; numbers in parentheses) are only shown for the complete-effect mode. The reduced models usually show similar or larger SD. SD are smaller in RIL as compared with the RI F3.

a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 3. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the different AI populations

For reducing the text, SD (numbers in parentheses) are only shown for the complete-effect mode in AI F3 and F4 populations. SD for the reduced models are usually similar or larger. The SDs of AI F5 have similar size in positions and main effects and larger size in epistatic effects as compared with those in AI F3. The estimates in AI F10 have a much larger SD. A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, QA and QB. The heritability is 0·05 in the F2 population. The critical values are determined by Piepho's method.

a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 4. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in F2, IF2 and IRI populations

For reducing the text, SDs (numbers in parentheses) are only shown for the complete-effect mode. SDs for the reduced models are usually similar or larger. A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, QA and QB. The heritability is 0·05 in the F2 population. The critical values are determined by Piepho's method. P1 (P2): position of QA (QB).

a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 3 presents the QTL mapping results for AI populations under the sparse maps. Under the sparse (dense) map, when the complete-effect model is considered, the detecting powers by the non-Markovian method are 0·61 (0·61), 0·62 (0·52), 0·47 (0·79) and 0·06 (0·65), respectively, in the AI F3, F4, F5 and F10 populations, and they are 0·59 (0·65), 0·59 (0·51), 0·46 (0·79) and 0·07 (0·70), respectively, by the Markovian method. When epistasis is ignored by using the four-effect model, the powers are reducing to 0·11 (0·11), 0·09 (0·08), 0·25 (0·17) and 0·12 (0·05), respectively, by the non-Markovian (Markovian) method. If the five-effect model is considered under sparse map, the powers by the non-Markovian (Markovian) method are 0·41 (0·39), 0·63 (0·37), 0·40 (0·41) and 0·09 (0·11), respectively, in the four populations. An increasing trend in power can be observed in the case of the dense map (not shown). However, such an increasing trend does not occur in the sparse map (Table 3). Also, by taking epistasis into account, the power can be much improved and the confounding problem can be avoided, and the means of the estimated effects are all very close to the true given parameters. Besides, the QTL positions are estimated with better precision in the AI populations as compared with those estimated in the RI populations. Among all the settings, the most powerful experimental population for QTL detection is the AI F3 (AI F5) population under the sparse (dense) map. The AI F10 population is not the optimal design under either map, as the powers are about 0·05–0·12 and about 0·45–0·70, respectively, in the two maps. It is expected that a much denser map is required to ensure more powerful QTL detection in the AI F10 population (see the Discussion section). When comparing the results of the AI and F2 populations (Table 4), the AI populations show more powerful results than the F2 population in all cases under the dense map.

Table 4 shows the QTL mapping results in the F2, IF2, IRI F5:1 and IRI F5:3 populations under the sparse maps. The QTL mapping results are better under the dense map in these later advanced populations as compared with those under the sparse map. For example, in the IF2 population, the powers under the dense map are 0·59 (0·58), 0·54 (0·52) and 0·35 (0·35) by the non-Markovian (Markovian) method for the complete-effect, five-effect and four-effect models (not shown), respectively. Under the sparse map, they are 0·06 (0·05), 0·05 (0·05) and 0·02 (0·03), respectively. The estimated positions and effects are also found to be more precise in the dense map. For example, under the complete-effect model, the estimated effects of a 1, d 1, a 2 and d 2 by the non-Markovian method are 1·919 (SD 0·657), 1·689 (SD 1·010), 1·757 (SD 0·670) and 1·677 (SD 0·912), respectively, in the dense map (not shown), and they are 1·241 (SD 1·028), 0·766 (SD 1·381), 1·351 (SD 0·907) and 0·850 (SD 1·512), respectively, in the sparse map. Similar situations were also found in the IRI F5:1 and IRI F5:3 populations. Besides, the complete-effect model is not appropriate for the IRI populations, and the three-effect and five-effect models are more appropriate for these two populations. For example, the powers in the IRI F5:1 population are 0·13 (0·01) and 0·07 (0·01) by the complete-effect model of the non-Markovian (Markovian) method in the two different maps, and the powers become 0·77 (0·76) and 0·92 (0·93) by the five-effect model, respectively. Also, taking the additive-by-additive effect into account can greatly benefit the QTL detection. A similar trend can be observed for the RIL.

5. Discussion

The genome structures of the advanced populations can be very different from each other and are no longer similar to that of the F2 population as mentioned before. This paper tries to distinguish between the genome structures of different populations to deal with the issues of QTL mapping. When using the advanced populations for QTL mapping, we propose the Markovian and non-Markovian methods to map for QTL. Some important properties and issues in QTL mapping, such as mapping closely linked QTLs, confounding problems of ignoring epistasis and the choice of different mapping models, are also derived and discussed across different populations. Theoretically, the non-Markovian method have better performances than the Markovian method, as the more accurate mixing proportions can be used in statistical modelling as discussed. In fact, analytical and simulation studies show that the non-Markovian method does perform better than the Markovian method in the advanced populations, especially in populations subject to the selfing process. The advanced populations can be also designed to be more powerful than the F2 population in QTL detection. Besides, the issues considered here are under the assumption of large sample size with no selection. In practice, selection and drift may play a role between generations, and it will cause unequal allele frequencies and potential segregation distortion. As suggested by Teuscher & Broman (2007), the solution to these problems is the use of a dense marker set with which the actual recombination breakpoints can be precisely mapped. The results presented here can give some clues to the use of advanced population for better investigation in genetical and biological studies.

The quality of QTL mapping relies on precisely deriving the conditional probabilities of putative QTL genotypes given marker genotypes and on applying appropriate statistical methods to link the quantitative traits with the putative QTL. When deriving the conditional genotypic distribution of a putative QTL, ideally, we would like to use the information from all the linked markers (as many linked markers as possible) to obtain it. This, however, is very challenging as the characterization of the genotypic distribution of many genes is not an easy task. The approach of interval mapping avoids this and proposes to use its two flanking markers instead in derivation, so that its task reduces to characterizing the genotypic distribution for three genes. Such an approach is optimal in capturing the QTL information for the genomes with a first-order Markovian property, but not for the genomes without this property. However, for the latter case, we believe that the closest marker pair may have already captured most of the information about QTLs. When multiple putative QTLs are considered in the advanced population, the joint conditional probability distribution used here is approximate and obtained by using conditional independence property as we are still not sure currently how to derive the exact conditional distribution for an arbitrary number of putative QTLs. In addition, when applying statistical models to detect QTLs, the specific genome structures of advanced populations have to be taken into account in modelling to benefit QTL detection. For example, in the RI or IRI populations, there are larger additive genetic variances (smaller dominance variances) and higher homozygosity (lower heterozygosity), and the applied models should consider that fitting the components involving additive effects into the model can benefit QTL detection and that fitting the components involving dominance effects into the model may deter QTL detection.

One of the most precious features in the advanced populations is that they can generate more recombinants to improve the QTL resolution. From the viewpoint of statistical modelling, such an improvement is to take advantage of more recombinants in a population to alleviate the collinearity problem in modelling-linked putative QTLs (to disassociate the linkage disequilibrium between linked putative QTLs), so that QTL mapping can be more powerful and precise (see the subsection ‘Power of separating closely linked QTLs’); nevertheless, more recombinants also reduce the linkage disequilibrium between markers and QTLs to blur the information about the unobservable putative QTL. Therefore, to expect improved QTL mapping results in the advanced population, a denser marker map around the linked QTL region is required to ensure that the linkage disequilibrium is strong enough in the construction. In a marker interval with given width, the linkage disequilibrium between markers and putative QTLs is strongest in the F2 population, and it becomes gradually weaker as generation advances. Taking a putative QTL Q in the middle of a 10 cM marker interval flanked by markers, A and B, as an example, the trigenic linkage disequilibrium defined as D AQB=P AQBP AP QP B (Wright, 1980) is 0·329 in the F2 population, and it becomes 0·309 (0·300), 0·286 (0·292), 0·260 (0·290) and 0·111 (0·288) in the AI (RI) F3, F4, F5 and F10 population, respectively. It shows that the linkage disequilibrium is declining more rapidly under random mating. In general, once the designed populations, such as IF2, IRI F5:1 and AI F10 populations, have undergone some generations of random mating, they usually require a much denser marker map to obtain improved results. Therefore, the marker density should be considered as a major factor not only in the comparison between the two proposed methods, but also in the issue of using advanced populations to improve QTL mapping results (see also the ‘Simulation studies’ section). Besides, the issues of trade-off between generation number and marker density and of extension to more than two founders (Mott et al., 2000; Broman, 2005) are interesting and worthy of pursuing in the future. Together with the (Fu/Fv, vu) designs (Fisch et al., 1996; Kao, 2006) and the strategy of replicated trials (Hua et al., 2002), it is very much possible for us to design experimental populations to recover or remove those undetected or ghost QTLs (Lander & Botstein, 1989) in the F2 population for high-resolution QTL mapping.

The authors are grateful to two anonymous reviewers for helpful comments. This work was supported by grant numbers NSC97-2118-M-001–008 from the National Science Council, Taiwan, Republic of China.

References

Broman, K. W. (2005). The genomes of recombinant inbred lines. Genetics 169, 11331146.
Churchill, G. A. & Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138, 967971.
Darvasi, A. (1998). Experimental strategies for the genetic dissection of complex traits in animal models. Nature Genetics 18, 1924.
Dempster, A. P., Larid, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 138.
Fisch, R. D., Ragot, M. & Gay, G. (1996). A generalization of the mixture model in the mapping of quantitative trait loci for progeny from a biparental cross of inbred lines. Genetics 143, 571577.
Geiringer, H. (1944). On the probability theory of linkage in Mendelian heredity. The Annals of Mathematical Statistics 15, 2557.
Haldane, J. B. S. & Waddington, C. H. (1931). Inbreeding and linkage. Genetics 16, 357374.
Hua, J. P., Xing, Y. Z., Xu, C. G., Sun, X. L., Yu, S. B. & Zhang, Q. (2002). Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162, 18851895.
Jansen, R. C. (1993). Interval mapping of multiple quantitative trait loci. Genetics 135, 205211.
Jennings, H. S. (1916). The numerical results of diverse systems of breeding. Genetics 1, 5389.
Jiang, C.-J. & Zeng, Z.-B. (1997). Mapping quantitative trait loci with dominant and missing markers in various populations from inbred lines. Genetica 101, 4785.
Kao, C.-H. (2004). Multiple interval mapping for quantitative trait loci controlling endosperm traits. Genetics 167, 19872002.
Kao, C.-H. (2006). Mapping quantitative trait loci using the experimental designs of recombinant inbred population. Genetics 174, 13731386.
Kao, C.-H. & Zeng, Z.-B. (1997). General formulas for obtaining the MLE and the asymptotic variance–covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53, 359371.
Kao, C.-H. & Zeng, Z.-B. (2002). Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics 160, 12431261.
Kao, C.-H., Zeng, Z.-B. & Teasdale, R. D. (1999). Multiple interval mapping for quantitative trait loci. Genetics 152, 12031216.
Koski, T. (2001). Hidden Markov Models for Bioinformatics. Boston, MA: Kluwer Academic Publishers.
Lander, E. S. & Botstein, D. (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185199.
Lee, M., Sharopova, N., Beavis, W. D., Grant, D., Katt, M., Blair, D. & Hallauer, A. (2002). Expanding the genetic map of maize with the intermated B73 Mo17 (IBM) population. Plant Molecular Biology 48, 453461.
Liu, S.-C., Kowalski, S. P., Lan, T.-H., Feldmann, K. A. & Paterson, A. H. (1996). Genome-wide high-resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142, 247258.
Lynch, M. & Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates.
Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 12, 591612.
Martin, O. C. & Hospital, F. (2006). Two- and three-locus tests for linkage analysis using recombinant inbred lines. Genetics 173, 451459.
Mott, R., Talbot, C. J., Turri, M. G., Collins, A. C. & Flint, J. (2000). From the cover: a method for fine mapping quantitative trait loci in outbred animal stocks. Proceedings of the National Academy of Sciences USA 97, 1264812654.
Piepho, H. P. (2001). A quick method for computing approximate threshold for quantitative trait loci detection. Genetics 157, 425432.
Robbins, R. B. (1918). Some applications of mathematics to breeding problems III. Genetics 3, 375389.
Rockman, M. L. & Kruglyak, L. (2008). Breeding designs for recombinant inbred advanced intercross lines. Genetics 179, 10691078.
Teuscher, F. & Broman, K. W. (2007). Haplotype probabilities for multiple-strain recombinant inbred lines. Genetics 175, 12671274.
Weir, B. S. (1996). Genetic Data Analysis II. Sunderland, MA: Sinauer Associates.
Weir, B. S. & Cockerham, C. C. (1977). Two-locus theory in quantitative genetics. Proceedings of the International Conference on Quantitative Genetics (ed. Pollak, E., Kempthorne, O. & Bailey, T. B.), pp. 247269. Ames, IA, USA: Iowa State University.
Winkler, C. R., Jensen, N. M., Cooper, M., Podlich, D. W. & Smith, O. S. (2003). On the determination of recombination rates in intermated recombinant inbred populations. Genetics 164, 741745.
Wright, S. (1980). Genic and organismic selection. Evolution 34, 825843.
Xu, S. (2007). An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63, 513521.
Zeng, Z.-B. (1994). Precision mapping of quantitative trait loci. Genetics 136, 14571468.
Zou, F., Gelfond, J. A. L., Airey, D. C., Lu, L., Manly, K. F., Williams, R. W. & Threadgill, D. W. (2005). Quantitative trait locus analysis using recombinant inbred intercrosses: theoretical and empirical considerations. Genetics 170, 12991311.
Zou, W. & Zeng, Z.-B. (2008). Statistical methods for mapping multiple QTL. International Journal of Plant Genomics, Article ID 286561, doi: 10.1155/2008/286561.