Bayesian analysis of quantitative traits using skewed distributions

L. VARONA; N. IBAÑEZ-ESCRICHE; R. QUINTANILLA; J. L. NOGUERA; J. CASELLAS

doi:10.1017/S0016672308009233

Bayesian analysis of quantitative traits using skewed distributions

Published online by Cambridge University Press: 22 April 2008

L. VARONA ,

J. L. NOGUERA and

L. VARONA*: Affiliation:
Genètica i Millora Animal, IRTA-Lleida, Av. Rovira Roure 191, 25198 Lleida, Spain
N. IBAÑEZ-ESCRICHE: Affiliation:
Genètica i Millora Animal, IRTA-Lleida, Av. Rovira Roure 191, 25198 Lleida, Spain
R. QUINTANILLA: Affiliation:
Genètica i Millora Animal, IRTA-Lleida, Av. Rovira Roure 191, 25198 Lleida, Spain
J. L. NOGUERA: Affiliation:
Genètica i Millora Animal, IRTA-Lleida, Av. Rovira Roure 191, 25198 Lleida, Spain
J. CASELLAS: Affiliation:
Genètica i Millora Animal, IRTA-Lleida, Av. Rovira Roure 191, 25198 Lleida, Spain
*: *Corresponding author. Tel: +34 973003441. Fax. +34 973238301. e-mail: Luis.Varona@irta.es

Article contents

Summary
Introduction
Materials and methods
Results and discussion
References

Rights & Permissions

Summary

Statistical models for genetic evaluation often make use of Gaussian distributions. However, some new statistical developments allow the use of an asymmetric distribution for the residuals. Within this context, we analysed three different patterns for the residual term on a data set consisting of 63 208 litter-size records, belonging to 19 255 sows, with a pedigree including 27 911 individuals. The three different residual distributions were: (1) Gaussian distribution, (2) asymmetric Gaussian distribution and (3) asymmetric Gaussian distribution with a hierarchical scheme for the asymmetry parameter. The operational model always included order of parity and herd-year-season as systematic effects, and the permanent environmental and infinitesimal genetic effect of each sow as random effects. The most suitable model using the deviance information criterion (DIC) and posterior predictive checking was model 3. This implies systematic, additive genetic and permanent environmental control of both litter size and the asymmetry parameter of the residual distribution. The asymmetry parameter can be understood as a measure of sensitivity to negative (or positive) environmental influences on phenotypes. The posterior mean (standard deviation) of the additive genetic variance was 0·28 (0·06) for litter size and 0·07 (0·01) for the asymmetry parameter. The posterior mean (standard deviation) of the additive genetic correlation between litter size and the asymmetry parameter was 0·21 (0·07).

Type: Paper
Information: Genetics Research , Volume 90 , Issue 2 , April 2008 , pp. 179 - 190

DOI: https://doi.org/10.1017/S0016672308009233 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2008

1. Introduction

Mixed linear models (Henderson, Reference Henderson1984) are used broadly in livestock and plant breeding to predict breeding values and to estimate variance components for traits of interest. The Gaussian distribution of the residual term is a common assumption in mixed linear models. In the animal breeding context, an alternative to the Gaussian assumption was proposed by Stranden & Gianola (Reference Stranden and Gianola1999), who modelled the residual term using a Student's t density. This kind of heavy-tailed distribution allows for more extreme residual values and, as a consequence, deviations from the Gaussian distribution such as preferential treatment (Kuhn et al., Reference Kuhn, Boettcher and Freeman1994) or other causes of outliers or abnormal phenotypic records (Jamrozik et al., Reference Jamrozik, Strandén and Schaeffer2004). Nevertheless, both Gaussian and Student's t distributions are symmetric, and little investigation into alternative approaches assuming a variable degree of skewness for the residual term has been done.

It is important to note that most of the uncontrolled sources of variation in animal production can be viewed as adverse factors involving a slight, moderate or even dramatic reduction in productive performance (e.g. pathologies, heat or cold, stress, fights and accidents), whereas favourable factors are probably limited to preferential treatment and social dominance hierarchy. Some authors have proposed the use of mixtures of distributions to model these peculiarities (Gianola et al., Reference Gianola, Heringstad and Odegaard2006), where observations can be assigned to different distributions (e.g. healthy vs. affected). However, it is very difficult to assign records to a finite number of distributions when sources of variation are unknown (e.g. preferential treatment, sub-clinical pathologies, unregistered heat or cold stress) and the statistical approach for infinite mixtures becomes very complex. An alternative to model these data is via the use of non-symmetric residual distribution for the environmental deviations, as was initially proposed by Fernandez & Steel (Reference Fernandez and Steel1998) and adapted to an animal breeding context by von Rohr & Hoeschele (Reference von Rohr and Hoeschele2002).

The use of skewed residual distributions in linear models has been focused mainly on describing the overall asymmetry in the population analysed (Fernandez & Steel, Reference Fernandez and Steel1998; von Rohr & Hoeschele, Reference von Rohr and Hoeschele2002). However, individual variation in the degree of asymmetry also seems plausible. Within this context, the asymmetry parameter could be modelled through a hierarchical model (Wakefield et al., Reference Wakefield, Smith, Racine-Poon and Gelfand1994; Varona et al., Reference Varona, Moreno, García-Cortes and Altarriba1997). Each record-specific asymmetry parameter would represent the ability to buffer against undesirable environmental influences and after accounting for genetic and environmental sources of variation. Indeed, this approach could be viewed as an attractive method to model robustness (or weakness) against controllable and uncontrollable genetic and environmental sources of variation. Within this context, there are several references in the literature regarding the genetic determinism of disease resistance (Gibson, Reference Gibson, Perry, Randolph, McDermont, Sones and Thornton2002; Bishop, Reference Bishop, Pond and Bell2004) and immune responses (Mallard et al., Reference Mallard, Wilkie, Kennedy, Gibson and Quinton1998; Henryon et al., Reference Henryon, Berg, Jensen and Andersen2001). Despite the crucial role that selection for disease resistance or robustness could play in animal breeding, not much attention has been focused on this question, mainly due to the difficulties involved in obtaining appropriate phenotypic records (Rothschild, Reference Rothschild, Owen and Axford1991).

Unfortunately, the Bayesian implementation of skewed distributions using the procedure suggested by Fernandez & Steel (Reference Fernandez and Steel1998) and von Rohr & Hoeschele (Reference von Rohr and Hoeschele2002), and involving Markov chain Monte Carlo (MCMC) techniques, requires a Metropolis--Hastings step (Hastings, Reference Hastings1970) to sample the asymmetry parameter. The development of a hierarchical model for the asymmetry parameter is therefore complex and computationally demanding for large data sets. However, other authors in statistics have developed new procedures for modelling non-symmetric distributions (Sahu et al., Reference Sahu, Dey and Branco2003; Jara & Quintana, Reference Jara and Quintana2007).

The aims of this study are: (1) to include non-symmetric residual distributions in the linear mixed models currently used in livestock and plant breeding following Sahu et al. (Reference Sahu, Dey and Branco2003), (2) to develop a hierarchical Bayesian scheme including systematic and additive genetic variation of the asymmetry parameter (Wakefield et al., Reference Wakefield, Smith, Racine-Poon and Gelfand1994; Varona et al., Reference Varona, Moreno, García-Cortes and Altarriba1997) and (3) to implement and compare these procedures with the standard mixed model approach using a litter-size data set from a pure-bred Landrace commercial population.

2. Materials and methods

(i) Statistical models

We took as a starting point the standard mixed model commonly used in animal breeding (Henderson, Reference Henderson1984):

(1)

where y is the vector of phenotypic data (number of piglets born alive (NBA)), β is the vector of systematic effects, p, u and e are the vectors of permanent environmental effects, additive genetic effects and residuals, respectively, and X, W and Z are the appropriate incidence matrices. Under a standard Bayesian approach, bounded uniform prior distributions between −50 and 50 units were assumed for β, and the following independent prior distributions were assumed for p and u:

(2)

(3)

where I_p is the appropriate identity matrix, A is the numerator relationship matrix between individuals and σ_p² and σ_u² are the permanent environmental and the additive genetic variances, respectively. In addition, for computational convenience, prior distributions for σ_p² and σ_u² were scale inverse chi-squared distributions with parameters s=0 and v=−2, which reduced it to a uniform distribution (Sorensen & Gianola, Reference Sorensen and Gianola2002). Moreover, it is computationally equivalent to a bounded proper prior between 0 and a huge and unreachable value. For the residual term, we considered three different prior distributions.

(a) A priori Gaussian distribution for the residual term (model 1)

The simplest model assumed a standard Gaussian distribution of residuals:

(4)

(5)

where n is the number of phenotypic records, I_e is an identity matrix with dimensions n×n, e _i is the ith term in e and σ_e² is the residual variance. As before, the prior distribution for σ_e² was a scale inverse chi-squared distribution with parameters s=0 and v=−2.

(b) A priori asymmetric Gaussian distribution for the residual term (model 2)

Following Sahu et al. (Reference Sahu, Dey and Branco2003), asymmetry in the residual term can be easily modelled by a skewed-normal density:

(6)

(7)

where λ is the degree of asymmetry defined in the real space and ɸ and Φ denote the density function and cumulative distribution function of a standard normal distribution with kernel as defined between parentheses, respectively.

Following Sahu et al. (Reference Sahu, Dey and Branco2003), the mean of the asymmetric Gaussian distribution is

(8)

the variance becomes

(9)

and the third central moment of the distribution is

(10)

Thus, the three parameters of the asymmetric Gaussian distribution are statistically identifiable from the first three moments of a given data set.

As before, the prior distribution for σ_e² was a scale inverse chi-squared distribution with parameters s=0 and v=−2. Finally, the prior distribution for λ was assumed flat between bounded limits (−50, 50).

(c) A priori asymmetric Gaussian distribution for the residual term with a hierarchical Bayesian scheme (model 3)

As suggested in the previous sections, the asymmetry parameter can be modelled under a hierarchical structure, with the a priori distribution of e being:

(11)

(12)

where λ is the vector of λ_i. A hierarchical model was assumed for λ such as

(13)

where β_λ, p_λ and u_λ are the vectors of systematic, permanent environmental and additive genetic effects, respectively. The prior distribution for each systematic effect of the asymmetry parameter is defined as a bounded uniform distribution between −50 and 50 units, and p_λ and u_λ are assumed to be correlated with p and u, respectively. Thus, the prior distributions for both effects were defined as:

(14)

(15)

where D and G are 2×2 permanent environmental and additive genetic (co)variance matrices, respectively. The following inverted Wishart distributions were assumed for G and D:

(16)

(17)

(ii) Field data

The models were tested on a data set consisting of NBA per litter from a pure-bred Landrace commercial pig population. The data set consisted of 63 208 litter-size records collected between 1982 and 1997 in six commercial farms from COPAGA SCCL (Lleida, Spain). Phenotypic data were from to 19 255 sows and the pedigree included 27 911 individuals. The raw mean was 9·04 piglets born alive with a standard deviation of 2·41 piglets. Data were grouped in six orders of parity (1, 2, 3, 4, 5 and >5) and 226 herd-year-season effects.

(iii) Bayesian implementation

The Bayesian implementation of the models was performed using a Gibbs sampler (Gelfand & Smith, Reference Gelfand and Smith1990). Full details of the conditional distributions needed for the implementation are presented in the Appendix. For each model, a single chain of 500 000 iterations was performed after discarding the first 50 000. Convergence was checked using the Raftery & Lewis (Reference Raftery, Lewis, Bernardo, Berger, Dawid and Smith1992) and Gelman et al. (Reference Gelman, Meng and Stern1996) procedures.

(iv) Sensitivity analysis

The influence of prior information on the posterior distribution has been tested under models 2 and 3. For model 2, the assumed prior for the degree of asymmetry was uniform, and model performance under two additional priors was studied,

(18)

(19)

Under model 3, the prior distribution for the additive variance components was assigned to an inverted Wishart distribution with parameters 0 and −3. Two alternative scenarios were considered:

(20)

(21)

(v) Model checking and model comparison

(a) Model checking

The fit of the statistical model to the data analysed can be assessed in a variety of ways. In a Bayesian context, a standard method for model checking involves the use of the posterior predictive distributions of discrepancies to diagnose particular failures of the model (Rubin, Reference Rubin1984; Gelman et al., Reference Gelman, Meng and Stern1996). Take T(y; θ) as a specific discrepancy measure allowing comparison of the posterior distribution of T(y^obs; θ) with the posterior predictive distribution of T(y^rep; θ). Here, y^obs is the observed data, y^rep is a simulated replicate of the data set at each iteration of the MCMC procedure, and θ represents the values sampled for all the parameters in the model in the given iteration. Systematic differences between T(y^obs; θ) and T(y^rep; θ) indicate a possible failing of the model.

In our particular case, we wanted to study the global discrepancy and the discrepancy associated with order of parity and sire family, and their relationship with the symmetry of the environmental variation under model 1. For global discrepancy, we defined the following measure of skewness:

(22)

where θ₁ represents all the unknown parameters in model 1, is the sampled value of the residual variance at each iteration and μ_i is the ith row in Xβ+Wp+Zu. The expected value of T(y^obs, θ₁)−T(y^rep, θ₁) under model 1 is zero, and values larger or smaller than 0 indicate asymmetry of the residuals. The degree of discrepancy was defined through the predictive P-values, calculated as the proportion of iterations where T(y^obs, θ₁)−T(y^rep, θ₁) was below zero (Gelman et al., Reference Gelman, Meng and Stern1996).

To study the discrepancy associated with the jth specific effect (order of parity or sire family), we also calculated the following measure:

(23)

At this point, N _j is the number of records for the jth effect and is the sampled value of the residual variance within the records for the jth effect at each iteration. As before, the expected value of T _j(y^obs, θ₁)−T _j(y^rep, θ₁) under model 1 is zero, and larger or smaller values indicate positive or negative asymmetry of the residuals. The degree of discrepancy was calculated through the predictive P-values.

(b) Model comparison

Models were also compared using the deviance information criterion (DIC) proposed by Spiegelhalter et al. (Reference Spiegelhalter, Best, Carlin and van der Linde2002). The DIC is defined as:

(24)

where is the vector of average values for all parameters in a given model (M) at the end of the sampling process,

(25)

(26)

with θ_M being the sampled values of all unknowns in model M in a given MCMC iteration. The DIC combines a measure of model fit () and a measure of model complexity () (Spiegelhalter et al., Reference Spiegelhalter, Best, Carlin and van der Linde2002). Models with smaller DIC exhibit a better fit.

(vi) Response to selection

We also used model 3 to infer the selection response for NBA and the asymmetry parameter. We calculated the posterior mean of the average breeding value corresponding to individuals born each year between 1981 and 1997, following the Bayesian techniques described by Sorensen et al. (Reference Sorensen, Wang, Jensen and Gianola1994). Furthermore, we also compared the expected selection gain using three different selection criteria in model 3: (1) breeding values for NBA, (2) breeding values for the degree of asymmetry and (3) a combined index with weights related to the potential increase in number of piglets. The expected litter size for a future individual can be calculated from , and we applied these weights for both breeding values in the selection index. We assumed directional selection for the top 20% of the pigs born after 1995. The procedure calculates the average breeding value at each iteration for the selected individuals. We considered selection on the basis of (1) breeding values for NBA under model 3, (2) breeding values for the degree of asymmetry and (3) a combined index with weights related to the potential increase in number of piglets.

3. Results and discussion

(i) Model fit and model comparison

Results from the study of model fit based on posterior predictive model checking are shown in Figs 1–3. Figure 1 presents the measure of global discrepancy showing that the posterior distribution of T(y, θ₁)−T(y^rep, θ₁) was centred at a negative value and did not include zero, their highest posterior density at 95% (HPD95), indicating a strong negative asymmetry of residuals under model 1. In fact, the posterior predictive P-value was lower than 10⁻⁶. Figure 2 presents the discrepancy measure associated with order of parity. As before, the posterior distribution of the discrepancy measure revealed negative asymmetry for each order of parity, with posterior predictive P-values lower than 10⁻⁶. Moreover, the posterior distributions of the measure of discrepancy for order of parity 1 differed considerably from the rest of the classes (i.e. orders of parity 2, 3, 4, 5 and 6; Fig. 2). This suggests that a model including differences between the degrees of asymmetry across systematic effects may be more plausible for the analysed data set. Finally, Fig. 3 shows discrepancy measures for the ten larger sire families. As in the previous case, the posterior estimates and posterior predictive P-values (lower than 10⁻⁶) indicate negative asymmetry. In addition, the non-negligible differences between some sire families (e.g. sire family 7 vs. 8), suggests a possible genetic determinism with regard to the degree of asymmetry.

Fig. 1. Boxplot for posterior predictive realizations of the discrepancy measure designed to test asymmetry in environmental variance.

Fig. 2. Boxplot of posterior predictive realizations of the discrepancy measure designed to test environmental variance heterogeneity due to order of parity.

Fig. 3. Boxplot of posterior predictive realizations of the discrepancy measure designed to test environmental variance heterogeneity due to sire family.

In strong concordance with the previous results concerning model fit, the Monte Carlo estimates of DIC for models 1, 2 and 3 were 139619·0, 138398·8 and 137757·8, respectively. Spielgelhalter et al. (Reference Spiegelhalter, Best, Carlin and van der Linde2002) considered differences in DIC of more than 7 to be important. Comparison based on DIC therefore favoured model 3 followed by model 2, and generally favoured the model that best captured the asymmetric pattern of the data.

(ii) Inferences on model parameters

The posterior mean and standard deviation estimates for the variance components under models 1 and 2 are presented in Table 1. The posterior estimates for the additive and permanent environmental variances were similar in the two models. On the contrary, the posterior mean estimate for σ_e² differed notably between model 1 (4·77, with a posterior standard deviation of 0·03) and model 2 (1·92 with a posterior standard deviation of 0·04). The posterior estimate was smaller under model 2 because of the presence of the asymmetry parameter on the skewed residual distribution. Following formula (9), the variance of the distribution depends on both the asymmetry parameter and the residual variance. The posterior mean estimate for the asymmetry parameter in model 2 was −2·79, with a posterior standard deviation of 0·02. Using expression (8), the posterior mean of the expectation of the asymmetric distribution was −2·23 piglets (with standard deviation of 0·02). This value can be understood as the loss of prolificacy due to environmental factors according to model 2, and agrees with previous assumptions suggesting a greater incidence of adverse uncontrolled environmental sources of variation than favourable ones. Finally, using formula (9), the posterior mean estimate of the variance for the asymmetric distribution under model 2 was 4·75 (with a posterior standard deviation of 0·04). As expected, this variance for model 2 was very similar to the residual variance under model 1.

Table 1. Monte Carlo estimates of posterior mean (and posterior standard deviation) for variance components and the degree of asymmetry under models 1 and 2

Results concerning the posterior distributions of variance components under model 3 are presented in Fig. 4. Posterior means for additive and permanent environmental variances affecting NBA were similar to those reported with models 1 and 2, but with greater standard deviations, due to increased complexity of the model. Under model 3, a different degree of asymmetry was peculiar to each of the data, and the average posterior mean estimate of the degree of asymmetry was −1·84 (with an empirical standard deviation of 1·91). Moreover, around 88% of the data were associated with negative posterior means for the degree of asymmetry.

Fig. 4. Posterior distributions of additive and permanent environmental variance for the NBA and the degree of asymmetry.

The posterior mean (and standard deviation) estimates for the additive genetic and permanent environmental variances of the degree of asymmetry were 0·07 (0·01) and 0·05 (0·01), respectively. The posterior probability over 0·04 of the additive variance component was 0·99. This provides evidence of the presence of additive genetic determinism in the individual degree of asymmetry, which can be interpreted as the indicator of genetic variability in robustness against unfavourable environmental effects affecting prolificacy. The posterior distributions of the additive genetic and permanent environmental correlations between NBA and the degree of asymmetry are presented in Fig. 5. These results suggest a slight, but positive, association between the NBA and resistance to environmental influences.

Fig. 5. Posterior distribution for genetic and permanent environmental correlations between the NBA and the degree of asymmetry.

The posterior mean estimates for order of parity effects with models 1, 2 and 3 are presented in Table 2. These results indicate that prolificacy increased until the fourth parity and decreased subsequently, confirming previous research findings (Kennedy & Moxley, Reference Kennedy and Moxley1978; Clark & Leman, Reference Clark and Leman1986; Noguera et al., Reference Noguera, Varona, Babot and Estany2002a). The posterior estimates for systematic effects with models 2 and 3 are higher than with model 1, as the former referred to the expectation of the asymmetric residual distribution, which is negative in the analysed data set. Estimates under models 2 and 3 can be understood as the potential NBA after the assumption of the asymmetric residual distribution, whose expectation is not zero. Posterior estimates in model 3 were lower than in model 2, consistent with the smaller estimates for the degree of asymmetry (−1·84 vs. −2·79). Posterior estimates for the degree of asymmetry associated with each order of parity were also obtained in model 3. The maximum degree of asymmetry was obtained in the first parity, indicating that younger sows were more sensitive to environmental stressors, as pointed out by several authors (Dagorn et al., Reference Dagorn, Saulnier and Greau1984).

Table 2. Monte Carlo estimates of posterior means (for order of parity effects for the NBA (models 1, 2 and 3) and degree of asymmetry (model 3))

The correlations between posterior mean estimates of breeding values for NBA were 0·99 between models 1 and 2, 0·92 between models 1 and 3 and 0·92 between models 2 and 3. From these results, the consequences of selection for NBA do not differ notably if we compare models 1 and 2, but more marked differences are expected if we use model 3. This model also provides the breeding values for the asymmetry parameter. For example, the female with the best breeding value for the asymmetry parameter showed a strong robustness against environmental influences along five parities, and had litter-size records of 11, 18, 16, 13 and 15 live-born piglets. On the contrary, the worst individual had a good performance in the first parity (ten piglets), but it suffered from negative environmental effects in the subsequent ones (NBA records 10, 5, 2 and 6).

(iii) Sensitivity analysis

The results of the sensitivity analysis to prior distributions are presented in Tables 3 and 4. Under model 2 (Table 3), estimates with alternative prior distributions for the degree of asymmetry were very similar. In all cases, even when the prior distribution was a Gaussian distribution with mean and variance equal to one, the posterior distribution placed its density in the negative space. This fact indicates that the likelihood (data) is very informative for the degree of asymmetry and it dominates clearly over the prior distribution.

Table 3. Monte Carlo estimates of posterior mean (and posterior standard deviation) for variance components and the degree of asymmetry under model 2 and priors (a) (uniform), (b) (N(1, 1)) and (c) (N(−1, 1)) for the degree of asymmetry

Table 4. Monte Carlo estimates of posterior mean (and posterior standard deviation) for variance components, genetic and permanent environmental correlations under model 3 with priors (Footnote a), (Footnote b) and (Footnote c)

(a) .

(b) .

Under model 3 (Table 4), the results were to some extent different under several prior specifications. With prior (b), the posterior mean estimates for the additive variance components for both NBA and the degree of asymmetry were higher than with prior (a). On the other hand, the opposite effect is observed with prior (c), for which the posterior mean estimates of additive variances decreased. The results were coherent with the prior specifications. In both cases, the posterior distribution moves slightly towards the prior, but the information provided by the likelihood still dominates the prior.

(vi) Experienced and expected response to selection

The evolution of the breeding values for NBA and the degree of asymmetry from 1981 to 1997 is presented in Fig. 6. From 1981 to 1992, there was a positive selection response for asymmetry and a flat or slightly negative selection response for the NBA. On the contrary, the tendency was the opposite from 1992 to 1997, and the selection response was mainly associated with the NBA. These results were in strong agreement with the selection background of the population. Until 1992, selection was performed by the farmers, who culled less productive individuals, whereas from 1992 onwards, selection was performed using BLUP procedures with model 1 (Noguera et al., Reference Noguera, Varona, Babot and Estany2002b). Culling of less productive animals is related to the asymmetry parameter, because the reason for culling is mainly related to extremely low NBA caused by environmental effects. If the asymmetry parameter has some degree of genetic determinism and it is related to robustness to undesirable genetic effects, a selection response would be expected. Thus, the evidence of genetic change in the asymmetry parameter observed in Fig. 6 is in agreement with the genetic determinism suggested by the variance component estimation presented above. On the other hand, the empirical correlation between breeding values calculated with model 1 and model 3 was 0·92 and so it is expected that selection on breeding values from model 1 had determined a positive genetic response under model 3, as observed in Figure 6 for the period from 1992 onwards.

Fig. 6. Selection response for the NBA and the degree of asymmetry.

Regarding the expected response to selection, selection on breeding values for NBA (i.e. omitting the genetic background of the asymmetry parameter) implied an increase of 0·44 piglets per parity. When selection was exclusively applied to breeding values of the asymmetry parameter, the expected improvement was 0·14 piglets per parity. Finally, a selection index with both breeding values produced a genetic response of 0·48 piglets. Thus, the selection based on an index that combines both breeding values resulted in a 10% increase in the expected selection response with respect to selection based on breeding values for NBA exclusively.

(v) Final remarks

The proposed model allows taking into account the differential sensitivity to unfavourable environmental influences in the genetic evaluation by including breeding values for the asymmetry parameter. They are related to the robustness of the individuals against sources of stress. Sensitivity to environmental sources of stress could have important economic consequences, not only for prolificacy, but also for a plethora of economically related traits, for which selection on the asymmetry parameter could therefore imply additional benefits. Further research must be conducted on reproductive and growth traits in pigs simultaneously.

The proposed model can be extended to include some additional features. First, the Gaussian distribution can be replaced easily by a more robust distribution, such as Student's t distribution (Stranden & Gianola, Reference Stranden and Gianola1999), which can account for divergence from the Gaussian distribution explained by preferential treatment or other possible phenomena. Moreover, this procedure provides an alternative to model heterogeneous residual variances and it should be compared with the methods proposed by SanCristobal-Gaudy et al. (Reference SanCristobal-Gaudy, Elsen, Bodin and Chevalet1998) and Sorensen & Waagepetersen (Reference Sorensen and Waagepetersen2003). Furthermore, it is also possible to combine both strategies, although the resulting model would be extremely complex and difficult to interpret. Another possible extension of the model involves the use of the asymmetric Gaussian distribution for other random effects in the model. Hence, the asymmetry of the additive breeding values can be explained by the presence of major genes with extreme frequency (Falconer & Mackay, Reference Falconer and Mackay1996), or by the presence of semi-deleterious mutations (García-Dorado et al., Reference García-Dorado, López-Fanjul and Caballero1999). Both can lead to asymmetric genetic responses of the type reported in the literature (Argente et al., Reference Argente, Santacreu, Climent, Bolet and Blasco1997; Zhang et al., Reference Zhang, Aggrey, Pesti, Bakalli and Edwards2005).

Financial support was provided by IRTA (grant number IRTA 21022). We are indebted to the staff of Nova Genética and COPAGA, in particular to E. Ramells, F. Marquez, R. Malé and F. Rovira, for technical support. We also thank R. Pena for useful comments.

Appendix. Conditional distributions required for the Gibbs sampler

(i) Model 1

Models were implemented by using MCMC techniques. The implementation of model 1 consisted of a standard application of the Gibbs sampler for a linear mixed model (Wang et al., Reference Wang, Rutledge and Gianola1994). The conditional distributions involved in the analysis were univariate Gaussian distributions for the additive genetic and permanent environmental effects and scaled inverse chi-squared distributions for the additive genetic, permanent environmental and residual variance components. For the systematic effects, the conditional distributions were truncated Gaussian distributions with the bounds of the assumed uniform prior. Computationally, it is equivalent to a Gaussian distribution when the bounds are far enough.

(ii) Model 2

Under model 2, the marginal distribution of the residuals can be reparameterized by adding a vector of auxiliary variables (t), following Sahu et al. (Reference Sahu, Dey and Branco2003) :

(A1)

With the parameterization described above, residuals can be reparameterized as e=λt+e* by using a data-augmentation step (Tanner & Wong, Reference Tanner and Wong1987), where prior distributions are assumed to be

(A2)

and

(A3)

where HN(.) is a positive half-normal standard distribution and I the appropriate identity matrix.

The implementation of MCMC for model 2 involved sampling of the conditional distribution of the asymmetry parameter (λ). It must be noted that, after adding the vector of auxiliary parameters (t), the model can be rewritten as:

(A4)

Then, the conditional distribution of λ is the following univariate Gaussian distribution:

(A5)

where x_i′, w_i′ and z_i′ are the ith rows of X, W and Z, respectively. The conditional distribution for t _i is generated from the conditional likelihood and the half-normal prior distribution. After multiplication, they produced the following half-normal distribution:

(A6)

defined for values between 0 and infinity. The remaining conditional distributions are the same as in model 1.

Model 3

As in the previous case, the model can be transformed into the following expression by using a vector of auxiliary parameters (t):

(A7)

where λ is the vector of λ_i.

The implementation of model 3 is similar to that of model 2, the conditional distributions for t _i being the following half-normal distribution defined between 0 and infinity:

(A8)

where λ_i=x_i′β_λ+w_i′p_λ+z_iu_λ.

The conditional distribution of each level of β_λ, u_λ and p_λ are obtained from the joint distribution of all the unknowns in the model, after conditioning on the rest of parameters. The conditional distribution of a given element of β_λ(β_λi) is the following Gaussian distribution:

(A9)

where β_λ−i is the vector of systematic effects for the degree of asymmetry without β_λi and Nβ_i is the number of records influenced by the ith systematic effect.

The conditional distribution of u _λi is proportional to the product of two normal distributions; the first one comes from the conditional likelihood:

(A10)

where Nu_i is the number of records associated with the ith additive genetic effect, and the second Gaussian distribution is provided by the prior information of the breeding values:

(A11)

where is the ith row of the inverse of the numerator relationship matrix, g ^mn is the element in the mth row and nth column of the inverse of the additive genetic (co)variance matrix (G), and u_λ(−i) is the vector of breeding values for the degree of asymmetry without the ith element. Then, the conditional distribution of u _λi is:

(A12)

Similarly, the conditional distribution of p _λi is proportional to the product of the two normal distributions:

(A13)

where Np _i is the number of records associated with the ith permanent effect and d ^mn is the element in the mth row and nth column of the inverse of D.

Finally, the conditional distributions of β, p and u are univariate Gaussian distributions, the conditional distributions for G and D are inverted Wishart distributions and the conditional distribution for the residual variance component is an inverted chi-squared distribution.

References

Argente, M. J., Santacreu, M. A., Climent, A., Bolet, G. & Blasco, A. (1997). Divergent selection for uterine capacity in rabbits. Journal of Animal Science 75, 2350–2354.CrossRef Google Scholar PubMed

Bishop, S. C. (2004). Disease resistance: genetics. In Encyclopedia of Animal Science (ed. Pond, W. G. & Bell, A. W.), pp. 288–290. Ithaca, NY: Marcel Dekker.CrossRef Google Scholar

Clark, L. K. & Leman, A. D. (1986). Factors that influence litter size in pigs: part 1. Pig News and Information 7, 303–310.Google Scholar

Dagorn, J., Saulnier, J. & Greau, P. (1984). Evolution et variation de la prolificité entre la première et la seconde portée. Journées de la Recherche Porcine en France 16, 145–152.Google Scholar

Falconer, D. S. & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics. New York: Longman.Google Scholar

Fernandez, C. & Steel, M. F. J. (1998). On Bayesian modelling of fat tails and skewness. Journal of American Statistics Association 93, 359–371.Google Scholar

García-Dorado, A., López-Fanjul, C. & Caballero, A. (1999). Properties of spontaneous mutations affecting quantitative traits. Genetical Research 74, 341–350.CrossRef Google Scholar PubMed

Gelfand, A. E. & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of American Statistical Association 85, 398–409.CrossRef Google Scholar

Gelman, A., Meng, X. L. & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica 6, 733–807.Google Scholar

Gianola, D., Heringstad, B. & Odegaard, J. (2006). On the quantitative genetics of mixture characters. Genetics 173, 2247–2255.CrossRef Google Scholar PubMed

Gibson, J. P. (2002). Role of genetically determined resistance of livestock to disease in the developing world: potential impacts and researchable issues. Appendix. In Investing in Animal Health Research to Alleviate Poverty (ed. Perry, B. D., Randolph, T. F., McDermont, J. J., Sones, K. R. & Thornton, P. L.), p. 14. Nairobi, Kenya: International Livestock Research Institute.Google Scholar

Hastings, W. K. (1970). Monte-Carlo methods using Markov chains in their applications. Biometrika 57, 97.CrossRef Google Scholar

Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. Guelph, ON, Canada: University of Guelph Press.Google Scholar

Henryon, M., Berg, P., Jensen, J. & Andersen, S. (2001). Genetic variation for resistance to clinical and subclinical diseases exists in growing pigs. Animal Science 73, 375–387.CrossRef Google Scholar

Jamrozik, J., Strandén, I. & Schaeffer, L. R. (2004). Random regression test-day models with residuals following a Student's t-distribution. Journal of Dairy Science 87, 699–705.CrossRef Google Scholar PubMed

Jara, A. & Quintana, F. (2007). Linear effects mixed models with skew-elliptical distributions: a Bayesian approach. Computational Statistics and Data Analysis (submitted).Google Scholar

Kennedy, B. W. & Moxley, J. E. (1978). Genetic and environmental factors influencing litter size, sex-ratio and gestation length in pig. Animal Production 27, 35–42.Google Scholar

Kuhn, M. T., Boettcher, P. J. & Freeman, A. E. (1994). Potential biases in predicted transmitting abilities of females from preferential treatment. Journal of Dairy Science 77, 2428–2437.CrossRef Google Scholar

Mallard, B. A., Wilkie, B. N., Kennedy, B. W., Gibson, J. & Quinton, M. (1998). Immune responsiveness in swine: eight generations of selection for high and low immune response in Yorkshire pigs. In Proceedings of the 6th World Congress on Genetics Applied to Livestock Production, pp. 257–264. Armidale, NSW, Australia: University of New England.Google Scholar

Noguera, J. L., Varona, L., Babot, D. & Estany, J. (2002 a). Multivariate analysis of litter size for multiple parities with production traits in pigs: I. Bayesian variance component estimation. Journal of Animal Science 80, 2540–2547.Google Scholar PubMed

Noguera, J. L., Varona, L., Babot, D. & Estany, J. (2002 b). Multivariate analysis of litter size for multiple parities with production traits in pigs: II. Response to selection for litter size and correlated response to production traits. Journal of Animal Science 80, 2548–2555.Google Scholar PubMed

Raftery, A. E. & Lewis, S. M. (1992). How many iterations on the Gibbs sampler? In Bayesian Statistics 4 (ed. Bernardo, J. M., Berger, J. O., Dawid, A. P. & Smith, A. F. M.), pp. 763–774. Oxford: Oxford Science Publications.CrossRef Google Scholar

Rothschild, M. F. (1991). Selection under challenging environments. In Breeding for Disease Resistance in Farm Animals (ed. Owen, J. B. & Axford, R. F. E.), pp. 73–85. Wallingford, UK: CAB International.Google Scholar

Rubin, D. B. (1984). Bayesian justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics 12, 1151–1172.CrossRef Google Scholar

Sahu, S. K., Dey, D. K. & Branco, M. D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics 31, 129–150.CrossRef Google Scholar

SanCristobal-Gaudy, M., Elsen, J.-M., Bodin, L. & Chevalet, C. (1998). Prediction of response to a selection for canalisation of a continuous trait in animal breeding. Genetics Selection Evolution 30, 423–451.CrossRef Google Scholar

Sorensen, D. A. & Gianola, D. (2002). Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. New York: Springer.CrossRef Google Scholar

Sorensen, D. A. & Waagepetersen, R. (2003). Normal linear models with genetically structured variance heterogeneity: a case study. Genetical Research 82, 207–222.CrossRef Google Scholar PubMed

Sorensen, D. A., Wang, C. S., Jensen, J. & Gianola, D. (1994). Bayesian analysis of genetic change due to selection using Gibbs sampling. Genetics Selection Evolution 26, 333–360.CrossRef Google Scholar

Spiegelhalter, D. J., Best, N. G., Carlin, B. P. & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society B 64, 583–639.CrossRef Google Scholar

Stranden, I. & Gianola, D. (1999). Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach. Genetics Selection Evolution 31, 25–42.CrossRef Google Scholar

Tanner, M. A. & Wong, H. W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82, 528–540.CrossRef Google Scholar

Varona, L., Moreno, C., García-Cortes, L. A. & Altarriba, J. (1997). Multiple trait analysis of underlying biological variables of production functions. Livestock Production Science 47, 201–209.CrossRef Google Scholar

von Rohr, P. & Hoeschele, I. (2002). Bayesian QTL mapping using skewed Student t-distributions. Genetics Selection Evolution 34, 1–21.CrossRef Google Scholar PubMed

Wakefield, J. C., Smith, A. F. M., Racine-Poon, A. & Gelfand, A. E. (1994). Bayesian analysis of linear and nonlinear population models using the Gibbs sampler. Applied Statistics 43, 201–221.CrossRef Google Scholar

Wang, C. S., Rutledge, J. J. & Gianola, D. (1994). Marginal inference about variance components in a mixed model using Gibbs sampling. Genetics Selection Evolution 26, 91–115.CrossRef Google Scholar

Zhang, W., Aggrey, S. E., Pesti, G. M., Bakalli, R. I. & Edwards, H. M. (2005). Genetic analysis on the direct response to divergent selection for phytate phosphorus bioavailability in a randombred chicken population. Poultry Science 84, 370–375.CrossRef Google Scholar

Fig. 1. Boxplot for posterior predictive realizations of the discrepancy measure designed to test asymmetry in environmental variance.

Fig. 2. Boxplot of posterior predictive realizations of the discrepancy measure designed to test environmental variance heterogeneity due to order of parity.

Fig. 3. Boxplot of posterior predictive realizations of the discrepancy measure designed to test environmental variance heterogeneity due to sire family.

Table 1. Monte Carlo estimates of posterior mean (and posterior standard deviation) for variance components and the degree of asymmetry under models 1 and 2

Fig. 4. Posterior distributions of additive and permanent environmental variance for the NBA and the degree of asymmetry.

Fig. 5. Posterior distribution for genetic and permanent environmental correlations between the NBA and the degree of asymmetry.

Table 2. Monte Carlo estimates of posterior means (for order of parity effects for the NBA (models 1, 2 and 3) and degree of asymmetry (model 3))

Fig. 6. Selection response for the NBA and the degree of asymmetry.

Article contents

Bayesian analysis of quantitative traits using skewed distributions

Summary

1. Introduction

2. Materials and methods

(i) Statistical models

(a) A priori Gaussian distribution for the residual term (model 1)

(b) A priori asymmetric Gaussian distribution for the residual term (model 2)

(c) A priori asymmetric Gaussian distribution for the residual term with a hierarchical Bayesian scheme (model 3)

(ii) Field data

(iii) Bayesian implementation

(iv) Sensitivity analysis

(v) Model checking and model comparison

(a) Model checking

(b) Model comparison

(vi) Response to selection

3. Results and discussion

(i) Model fit and model comparison

(ii) Inferences on model parameters

(iii) Sensitivity analysis

(vi) Experienced and expected response to selection

(v) Final remarks

Appendix. Conditional distributions required for the Gibbs sampler

(i) Model 1

(ii) Model 2

Model 3

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests