Review: Opportunities and challenges for small populations of dairy cattle in the era of genomics

K. Schöpke; H. H. Swalve

doi:10.1017/S1751731116000410

Review: Opportunities and challenges for small populations of dairy cattle in the era of genomics

Published online by Cambridge University Press: 09 March 2016

K. Schöpke and

H. H. Swalve

Show author details

K. Schöpke*: Affiliation:
Saxon State Stud Administration, Schlossallee 1, 01468 Moritzburg, Germany
H. H. Swalve: Affiliation:
Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Theodor-Lieser-Str. 11, D-06120 Halle/Saale, Germany
*: †E-mail: kati.schoepke@web.de

Article contents

Abstract
Implications
Introduction
Relevance of linkage disequilibrium and relationship level
Combining populations from different countries or different breeds
Including female information into reference populations
Imputation of un-genotyped animals
Additional aspects for small populations
The challenges of phenotyping
Conclusion
References

Abstract

In modern dairy cattle breeding, genomic breeding programs have the potential to increase efficiency and genetic gain. At the same time, the requirements and the availability of genotypes and phenotypes present a challenge. The set-up of a large enough reference population for genomic prediction is problematic for numerically small breeds but also for hard to measure traits. The first part of this study is a review of the current literature on strategies to overcome the lack of reference data. One solution is the use of combined reference populations from different breeds, different countries, or different research populations. Results reveal that the level of relationship between the merged populations is the most important factor. Compiling closely related populations facilitates the accurate estimation of marker effects and thus results in high accuracies of genomic prediction. Consequently, mixed reference populations of the same breed, but from different countries are more promising than combining different breeds, especially if those are more distantly related. The use of female reference information has the potential to enlarge the reference population size. Including females is advisable for small populations and difficult traits, and maybe combined with genotyping females and imputing those that are un-genotyped.

The efficient use of imputation for un-genotyped individuals requires a set of genotyped related animals and well-considered selection strategies which animals to choose for genotyping and phenotyping. Small populations have to find ways to derive additional advantages from the cost-intensive establishment of genomic breeding schemes. Possible solutions may be the use of genomic information for inbreeding control, parentage verification, within-herd selection, adjusted mating plans or conservation strategies.

The second part of the paper deals with the issue of high-quality phenotypes against the background of new, difficult and hard to measure traits. The use of contracted herds for phenotyping is recommended, as additional traits, when compared to standard traits used in dairy cattle breeding can be measured at set moments in time. This can be undertaken even for the recording of health traits, thus resulting in complete contemporary groups for health traits. Future traits to be recorded and used in genomic breeding programs, at least partly will be traits for which traditional selection based on widespread phenotyping is not possible. Enabling phenotyping of sufficient numbers to enable genomic selection will rely on cooperation between scientists from different disciplines and may require multidisciplinary approaches.

Keywords

genomic selection reference sample new phenotypes health traits

Type: Review Article
Information: animal , Volume 10 , Issue 6 , June 2016 , pp. 1050 - 1060

DOI: https://doi.org/10.1017/S1751731116000410 [Opens in a new window]
Copyright: © The Animal Consortium 2016

Implications

Genomic selection uses the knowledge on variants found on the DNA, their association with phenotypic records or breeding values, the derivation of genomic breeding values and finally consists of the use of these estimated breeding values in genetic selection of candidates with and without available phenotypic records. For this method, the size and the effective population size of a so-called reference population with genotypic and phenotypic information are key parameters. In small populations, the size of a reference population, which provides a sufficient amount of information, is difficult to obtain when using only males. To achieve this goal, the use of information from other populations or genotyping and phenotyping of females in addition to male animals are possible solutions discussed in this review paper.

Introduction

The introduction of genomic selection made it feasible to obtain breeding values early in life and this has substantially changed dairy cattle breeding schemes. Selection decisions made early in the life of an animal, instead of prolonged progeny testing reduces the generation interval and thus leads to significant increase of yearly genetic gain. The application of genomic breeding programs demands a huge investment in the required infrastructure for the initial setup and a continuous process of genotyping, data management, estimating genomic breeding values and selection. Apart from a high start-up cost, the routine implementation of genomic selection into dairy cattle breeding schemes has the potential for enormous savings in costs for keeping bulls from the time of birth up to an age of 5 years compared with progeny testing schemes, as seen in Holstein populations and as suggested by Schaeffer (Reference Pszczola, Strabel and Calus2006).

For small dairy cattle populations, the situation is more challenging, since costs of the breeding program can become disproportionately high. Additionally, the respective breeding companies often do not only have to deal with the usual challenge of reaching highest genetic gain in comparison to competitors. There are also other conflicting goals such as genetic diversity, genetic uniqueness or specific local conditions. However, even in small dairy cattle populations, genomic breeding schemes are genetically (Kariuki et al., Reference Hozé, Fritz, Phocas, Boichard, Ducrocq and Croiseau2014; Thomasen et al., Reference Swalve, Floren, Wensch-Dorendorf, Schöpke, Pijl, Wimmers and Brenig2014a), as well as economically superior to conventional breeding schemes (Thomasen et al. Reference Swalve, Floren, Wensch-Dorendorf, Schöpke, Pijl, Wimmers and Brenig2014a). This superiority is lower than estimated for larger populations, as the major limitation for small populations is the comparatively low accuracy of genomic predictions (Kariuki et al., Reference Hozé, Fritz, Phocas, Boichard, Ducrocq and Croiseau2014; Thomasen et al., Reference Swalve, Floren, Wensch-Dorendorf, Schöpke, Pijl, Wimmers and Brenig2014a). Consequently, it is difficult to implement more efficient and at the same time more cost-effective breeding schemes (Thomasen et al., Reference Swalve, Floren, Wensch-Dorendorf, Schöpke, Pijl, Wimmers and Brenig2014a), although, in terms of genetic gain, reduced accuracies are partially compensated by shortened generation intervals (Kariuki et al., Reference Hozé, Fritz, Phocas, Boichard, Ducrocq and Croiseau2014).

In conclusion, the main focus of small dairy cattle breeds should be to find ways to increase the reliability of genomic predictions. In fact, there are various factors that directly or indirectly influence the reliability of genomic predictions. In order to maximize reliability, characteristics of the trait, characteristics of the reference population, properties of the overall population, as well as interactions between these parameters need to be considered (Figure 1). There is now a substantial body of research on the impact of one or more of these factors.

Figure 1 Parameters and interactions between parameters directly or indirectly influencing the accuracy of genomic prediction in dairy cattle.

The main characteristic of small populations is ‘limited information.’ Different reasons can lead to a situation with limited information. First, there are small populations in a narrower sense: numerically small breeds, for example national populations of small countries with a low number of individuals, or rare and endangered breeds. Second, in a broader sense, limited information results from a lack of phenotypes, even in situations where the population size is large. Scarceness of phenotypic data can be a problem when traits are difficult or costly to measure, for example hormone profiles, methane measurements or antibody response. For traits that can only be assessed very late in the life of the animal (e.g. length of productive life), or even after the end of the life (e.g. meat quality), a direct evaluation is difficult and thus massive phenotype data is often missing. Furthermore, sex-limited traits, for example milk production, female/male fertility or traits that are newly established can be problematic in terms of extensive data collection. The situations described so far can result in a limited amount of information, either for the genotype data, for the phenotype data or both.

The objective of the first part of this paper is to present and discuss different opportunities to enlarge the size and optimize the composition of reference populations. Thus, focus is on three approaches: first, genomic predictions in a multi-breed context and in a multi-country context; second, enlarging the reference population with the use of female genotype information; third, imputation of completely un-genotyped animals. The second part of this paper reviews the evaluation of new and promising phenotypes within the context of limited recording.

Relevance of linkage disequilibrium and relationship level

Genomic selection is based on the fact that the markers used are in linkage disequilibrium (LD) with causal variant. LD, also called gametic phase disequilibrium, can be induced by selection, when one combination of alleles is favored over another. Under this positive selection the mating is non-random and the frequencies of advantageous alleles will increase (Falconer and Mackay, Reference de Vries, Bokkers, van Schaik, Engel, Dijkstra and de Boer1996), what is also known as ‘the hitch-hiking effect of a favorable gene’ (Smith and Haigh, Reference Simeone, Misztal, Aguilar and Vitezica1974). The selective sweep can also occur with intermixture of populations with different gene frequencies. In small populations, LD can also arise by chance, since genetic drift influences allele frequencies and haplotype frequencies for alleles of neighbored markers (Falconer and Mackay, Reference de Vries, Bokkers, van Schaik, Engel, Dijkstra and de Boer1996). Thus, variability can decrease and LD can be generated and strengthened, which is especially relevant for small populations.

In general, related individuals have a higher probability to share alleles and haplotypes from common ancestors than unrelated individuals do. Consequently, the pattern of LD within a population depends on the historical development of the population, especially the development of its effective population size (Sved, Reference Smith and Haigh1971). The effective population size of Bos taurus cattle decreased from >50 000 to 1000 to 2000 during the domestication process. With the formation of breeds, intense selection, and inbreeding, many cattle breeds reached a N_e of ~100 (de Roos et al., Reference Pszczola, Veerkamp, de Haas, Wall, Strabel and Calus2008; Kemper and Goddard, Reference Jiménez-Montero, González-Recio and Alenda2012). Consequently, there are large chromosome segments that are identical by descent and thus long-range LD exists within breeds (Goddard and Hayes, Reference Erbe, Hayes, Matukumalli, Goswami, Bowman, Reich, Mason and Goddard2009). Kemper et al. (Reference Kariuki, Komen, Kahi and van Arendonk2015) demonstrated in a study on Holstein and Jersey that quantitative trait loci (QTL) segregating across breeds often arise from older mutations appearing several 10 000 generations ago. However, only few QTL segregate across populations (Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012). In fact, Holstein-specific QTL are often parts of long haplotypes, which indicates a more recent occurrence of the causative mutation (Kemper et al., Reference Kariuki, Komen, Kahi and van Arendonk2015). Accordingly, long-range LD does not necessarily exist between breeds where LD only extends across short regions (Goddard and Hayes, Reference Erbe, Hayes, Matukumalli, Goswami, Bowman, Reich, Mason and Goddard2009). Sometimes the direction of the single-nucleotide polymorphism (SNP) effect can even switch between breeds in such a way that a specific SNP allele may be associated with an unfavorable QTL allele in a different population, and therefore opposite SNP effects may be obtained.

In populations with a high relationship level, the LD information originating from genetic relationships helps in achieving a high reliability of genomic prediction even though some markers and causal variants are in imperfect LD. The latter fact may cause a loss of information. With decreasing relationship level, the information loss increases (de los Campos et al., Reference Kemper, Hayes, Daetwyler and Goddard2013). Consequently, the relationship level between the reference population and the selection candidates strongly influence the reliability of genomic predictions (de los Campos et al., Reference Kemper, Hayes, Daetwyler and Goddard2013; Wientjes et al., Reference Wientjes, Veerkamp and Calus2013) and is more important than the LD per se (Wientjes et al., Reference Wientjes, Veerkamp and Calus2013).

For small populations, there is an interaction between the size of the reference population and the proportion of the effect of LD and family relationships influencing the accuracy of genomic prediction (Clark et al., Reference Clark, Hickey, Daetwyler and van der Werf2012). The smaller the reference population, the larger the effect of family relationships compared with the effect of LD. It has been demonstrated repeatedly that the accuracy of genomic prediction strongly depends on the level of relationship between the reference population and the test population (Habier et al., Reference Goddard2010; Daetwyler et al., Reference Daetwyler, Kemper, van der Werf and Hayes2012; Pszczola et al., Reference Pryce, Arias, Bowman, Davis, MacDonald, Waghorn, Wales, Williams, Spelman and Hayes2012; Wientjes et al., Reference Wientjes, Veerkamp and Calus2013). However, in larger reference populations, the effect of family relationships decreases (Clark et al., Reference Clark, Hickey, Daetwyler and van der Werf2012) and similarities in allele frequencies, haplotypes and LD patterns become more important. Consequently, the requirement for an effective use of mixed reference populations is that either a noteworthy proportion of LD that is captured due to close linkage between marker and QTL exists, or the involved populations are closely related due to recent genetic exchange (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Ezra, Ron, Shirak and Weller2014).

Apart from the use of sequence data for an improved understanding of the genetic architecture of a trait, sequence data can aid in improving the accuracy of genomic selection. However, as has been shown by BrØndum et al. (Reference Brøndum, Su, Janss, Sahana, Guldbrandtsen, Boichard and Lund2015), a change in accuracy may be limited to a 1% to 5% increase. MacLeod et al. (Reference Lund, de Roos, de Vries, Druet, Ducrocq, Guillaume, Guldbrandtsen, Liu, Reents, Schrooten, Seefried and Su2014) showed that a benefit from the use of sequence data mainly can be achieved in populations with a large effective population size and/or a comparatively low level of LD while in populations like the Holstein population, an increase of accuracy for genomic breeding values may only be very small. Both conditions, a large effective size and a low level of LD, hence are not fulfilled for typical small populations as seen in livestock. Sequence data can also be used for the identification of causal variants of genes affecting traits of interest and causal variants could be included in SNP panels across breeds. However, the identification of causal variants requires suitable phenotypes that are defined in a way which reflects the physiological background and allows distinguishing between genotypes. As costs for phenotyping in large populations may be scalable leading to lower costs per phenotype, the identification of causal variants in large populations thus could contribute to an improved use of genomics in small populations who by their own cost structure would not be capable of implementing the desired ways of phenotyping.

Combining populations from different countries or different breeds

From the very beginning of the genomic selection era, the cooperation between different populations in terms of joint reference populations was an essential point. Since the availability of large numbers of animals with phenotypes and genotypes is essential for successful genomic prediction (Goddard, Reference Egger-Danner, Schwarzenbacher and Willam2009), cooperation is helpful and even necessary for initial processes. Thus, collaborations have already been established such as the EuroGenomics consortia (Lund et al., Reference Lund, de Roos, de Vries, Druet, Ducrocq, Fritz, Guillaume, Guldbrandtsen, Liu, Reents, Schrooten, Seefried and Su2010) or the collaboration between United States, Canada, Italy and United Kingdom in which genotypes for dairy cattle breeds have been shared since 2007 (Schenkel et al., Reference Schaeffer2009; VanRaden et al., Reference Thomasen, Sørensen, Lund and Guldbrandtsen2012). These collaborations benefitted from the close relationship between individuals from the participating populations due to exchange of genotypes and therefore, by enlarging the reference population, reached a higher level of reliability for genomic predictions (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Lawlor, Forni and Weller2011).

Meanwhile, there is a substantial body of research on the consequences of combining populations from different countries or different breeds into one joint, and thus extended, reference population (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Ezra, Ron, Shirak and Weller2014). In terms of genomic prediction accuracy, the results of these studies vary widely. The reported increase in accuracy has reached up to 32% (Zhou et al., Reference Zhou, Ding, Zhang, Wang, Lund and Su2013), but also losses in reliability have been observed when using joint reference populations (e.g. Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012). The following section gives an overview on studies of reference populations compiled from different cattle populations.

One possibility is to combine populations of the same breed, but from different countries. Studies evaluating this scenario predominantly report a gain in reliability. Several analyses examine the combination of the US-American populations with a foreign population of the same breed and forming a joint reference population. VanRaden et al. (Reference Thomasen, Sørensen, Lund and Guldbrandtsen2012) analyzed milk production traits and calculated a gain in accuracy of 2% when enlarging the US-American Holstein Friesian reference population by 24% (from 10 534 bulls and 22 800 cows to 18 508 bulls and 22 800 cows) with bulls from Canada, Italy and the United Kingdom. Enlarging the US-American Brown Swiss population by 73% (from 812 bulls and 374 cows to 1682 bulls and 374 cows) with bulls from Austria, Germany and Switzerland, resulted in an increase in reliability between 1% and 5% (VanRaden et al., Reference Thomasen, Sørensen, Lund and Guldbrandtsen2012). Moreover, the US-American Jersey population benefitted by 2% by the inclusion of Danish Jersey bulls, whereas in turn the Danish bulls showed a gain of 10% in reliability (Wiggans et al., Reference Wiggans, Su, Cooper, Nielsen, Aamand, Guldbrandtsen, Lund and VanRaden2015).

For European dairy cattle populations, Lund et al. (Reference Lourenco, Misztal, Tsuruta, Aguilar, Lawlor, Forni and Weller2011) reported a gain in reliability between 2% and 13% for protein yield when a joint reference population consisting of Norwegian, German, French and Dutch Holstein Friesian bulls was compared to the results from the separate national reference populations. Sharing genotypes has been shown to be especially profitable if one population has a very small size. Zhou et al. (Reference Zhou, Ding, Zhang, Wang, Lund and Su2013) increased nearly four times the Chinese Holstein Friesian reference population by adding Nordic reference animals. They gained an increase in reliability between 25% and 32% for milk performance traits. One of the reasons why is a very consistent LD between these two populations.

The LD between marker and QTL is the key for genomic prediction, and thus, substantially affects the reliability of predictions. For genetically close populations it is even feasible to use one population to estimate the marker effects for genomic prediction in the second population. Schenkel et al. (Reference Schaeffer2009) demonstrated the successful application of marker effects estimated from the US-American Holstein Friesian reference population for genomic prediction in the Canadian Holstein Friesian population. Compared to the use of the Canadian Holstein Friesian population as a reference population, an increase in reliability between 8% (protein yield) and 10% (milk yield) for milk production traits was observed, when a reference population of 4127 US-American Holstein Friesian bulls was used for genomic predictions in 1097 Canadian Holstein Friesian bulls. The same approach is hardly possible for more distinct populations or populations, which belong to different breeds. Studies suggest that applying estimated SNP effects from one breed to another resulted in poor accuracies (BrØndum et al., Reference Brøndum, Rius-Vilarrasa, Strandén, Su, Guldbrandtsen, Fikse and Lund2011; Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012; Olson et al., Reference MacLeod, Hayes and Goddard2012). This indicates that the marker-QTL association is breed specific, and thus SNP effects cannot be transferred directly from one breed to another.

However, instead of attempting to transfer marker effects from one breed to another the use of joint reference populations can bring considerable advantages, especially for small populations. There is now a substantial body of work on multi-breed reference populations. Some studies have focused on the Nordic dairy cattle populations. The combination of Danish Red, Swedish Red, Finish Red (BrØndum et al., Reference Brøndum, Rius-Vilarrasa, Strandén, Su, Guldbrandtsen, Fikse and Lund2011) and Norwegian Red (Zhou et al., Reference Zhou, Heringstad, Su, Guldbrandtsen, Meuwissen, Svendsen, Grove, Nielsen and Lund2014) into one joint reference set showed that an increase in reliability of genomic prediction can be reached by combining reference populations from related populations. The amount of gain in reliability changed with the genetic relationship level between the combined breeds and with the individual trait. While accuracies for production traits increased significantly with the jointly estimated marker effects, reproduction and health traits did not benefit (BrØndum et al., Reference Brøndum, Rius-Vilarrasa, Strandén, Su, Guldbrandtsen, Fikse and Lund2011, Zhou et al., Reference Zhou, Heringstad, Su, Guldbrandtsen, Meuwissen, Svendsen, Grove, Nielsen and Lund2014). Olson et al. (Reference MacLeod, Hayes and Goddard2012) observed an improvement in accuracies from a mixed reference population consisting of American Holstein, Jersey and Brown Swiss over the accuracies from the within breed estimation. Two studies on a multi-breed reference population consisting of Australian Holstein Friesian and Australian Jersey did not find any increase in reliabilities for Holstein (Hayes et al., Reference Gonzalez-Recio, Coffey and Pryce2009; Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012), but a slight increase could be observed for Jersey when using high density panels and Bayesian approaches (Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012). The findings of studies applying multi-breed reference sets agree that for the population with the fewest observations in the reference, the gain in reliability is highest (Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012; Karoui et al., 2012; Olson et al., Reference MacLeod, Hayes and Goddard2012). Often, only the smaller population benefits, while it is hardly useful for the other breeds in the reference populations (Calus et al., Reference Calus, de Haas, Pszczola and Veerkamp2014; Hozé et al., Reference Haile-Mariam, Pryce, Schrooten and Hayes2014).

The extent of the gain in reliability that might be reachable in general depends on the proportion of the newly available information to the already existing information (Calus et al., Reference Calus, de Haas, Pszczola and Veerkamp2014). If there is already plenty of information available, an increase in genomic prediction accuracies by adding another breed will hardly be feasible (Simeone et al., 2012). Thus, the amount of information that comes from an additional population strongly influences the benefit from multi-breed genomic predictions (Calus et al., Reference Calus, de Haas, Pszczola and Veerkamp2014; Hozé et al., Reference Haile-Mariam, Pryce, Schrooten and Hayes2014). Additionally, the characteristics of the respective trait play an important role for the gain in reliability. The heritability of the individual trait (Karoui et al., 2012; Zhou et al., Reference Zhou, Heringstad, Su, Guldbrandtsen, Meuwissen, Svendsen, Grove, Nielsen and Lund2014) and the genetic architecture of the QTL significantly influence the gain in reliability.

Wientjes et al. (Reference Wientjes, Calus, Goddard and Hayes2014) found allele frequencies of QTL to be key parameters, in which low frequencies reduce accuracies in general. Additional factors influencing the amount of the gain in reliability in genomic predictions within a multi-population context are the relatedness of the joint breeds, appropriate statistical methods and the actual value of new information. These factors are associated with the consistency of LD between different populations within a multi-population reference set.

Statistical approaches for multi-breed genomic prediction

For genomic predictions within a multi-breed context, there are two main approaches: the genomic best linear unbiased prediction (GBLUP)-based models and the Bayesian models.

The GBLUP approach can be used for either single trait or multi trait models. In the single-trait approach, all breeds are considered to be the same population. Thus, marker effects are assumed to be the same across breeds. Accordingly, single-trait GBLUP models performs well under the condition of existing long-range LD and high relationship level (de Los Campos et al., Reference Kemper, Hayes, Daetwyler and Goddard2013). Therefore, single-trait GBLUP is an appropriate method if breeds are closely related (de Los Campos et al., Reference Kemper, Hayes, Daetwyler and Goddard2013, Calus et al., Reference Calus, de Haas, Pszczola and Veerkamp2014) and the phenotypes are measured in the same way (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Ezra, Ron, Shirak and Weller2014).

The GBLUP multi-trait model treats phenotypes from different populations as different but correlated traits. Thus, it accounts for interactions between marker and breed as well as for interactions between marker and environment interaction (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Ezra, Ron, Shirak and Weller2014). Multi-trait GBLUP model have been applied in several studies on multi-breed reference populations in dairy cattle (Karoui et al., 2012; Zhou et al., Reference Zhou, Ding, Zhang, Wang, Lund and Su2013; Haile-Mariam et al., Reference Goddard and Hayes2015). For production traits analyzed in three different US-American dairy cattle populations, the gain in accuracy was very small (0% to 3%), predictions resulting from multi-trait compared with those from single trait model (Olson et al., Reference MacLeod, Hayes and Goddard2012). Using breed-specific models in multi-breed predictions including combinations of other breeds might result in a higher gain in accuracy due to different marker-breed or marker-environment interactions.

In general, the limitation of GBLUP models is the assumption that variance and covariance of SNP are the same across the genome (Lund et al., Reference Lourenco, Misztal, Tsuruta, Aguilar, Ezra, Ron, Shirak and Weller2014). In contrast, Bayesian approaches allow for individual genomic variance for each marker, dependent on its association with the respective trait. Besides the advantage to consider the individual explained variance of each SNP, benefits may also occur in situations where only a part of the QTL segregate across breeds and where the LD phase between marker allele and QTL allele is different within the joint breeds (Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012). With the non-linear models more emphasis can be given on information of animals that are closer to each other (Calus et al., Reference Calus, de Haas, Pszczola and Veerkamp2014). Penalized methods have been widely applied in multi-population contexts (Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012; Olson et al., Reference MacLeod, Hayes and Goddard2012; Hozé et al., Reference Haile-Mariam, Pryce, Schrooten and Hayes2014; Wiggans et al., Reference Wiggans, Su, Cooper, Nielsen, Aamand, Guldbrandtsen, Lund and VanRaden2015). Erbe et al. (Reference de Roos, Hayes, Spelman and Goddard2012) analyzed within and across breed calculations. The authors report that the differential shrinkage methods outperformed the GBLUP method especially in cases where a multi-breed reference set, consisting of Holsteins and Jerseys was used. In this case, the improvement in accuracy for the smaller breed was remarkably high (up to 15%).

Calus et al. (Reference Calus, de Haas, Pszczola and Veerkamp2014) stated that an additional slight increase could arise from a combined prediction using linear and non-linear models. Using features of both models allows for building a prediction model of increased flexibility compared to either model by itself. In addition, the linear genomic prediction with a multi-trait model delivers an estimated genetic correlation between the same traits in two different breeds, which helps to examine the potential of the information from one breed for the other breed.

Overall, a decision on the combination of populations from different breeds or different countries into one reference population strongly depends on the relationship level between the populations. The use of across-population information in terms of genomic prediction is of little or no efficiency for large populations, while it is a promising opportunity for small populations given a closely related population is available. Statistical methods, marker density, and LD structure need a thorough analysis to work out if and in which way joint reference populations are useful in the individual situation.

Including female information into reference populations

For the estimation of SNP effects, genotyped animals with available phenotype information are used. Usually, these animals are proven bulls and their estimated breeding values that are based on large daughter groups serve as phenotypes. In small populations, the number of sires, the number of daughter records per sire or both, is often limited. Then the inclusion of females to the reference population could offer a reasonable solution. However, also in medium-size or large populations the number of bulls with tested progeny will decrease and replacement of reference individuals can become difficult with males only.

Influence on genomic prediction accuracy and genetic gain

An example for a large-scale incorporation of females in the prediction of SNP effects is the Australian 10 000 Holstein genomes project. During this project, genotype and phenotype information from 10 000 Holstein cows from dairy farms were collected and introduced into the Australian reference population. As a result, the accuracy of genomic-based breeding values increased significantly for young bulls (around 8%), for cows and for hard to measure traits in general (Pryce et al., Reference Pimentel, Wensch-Dorendorf, König and Swalve2012a). Actually, numerous reference populations already include information of genotyped females – the US-American reference population for example contains a significant female proportion (van Raden et al., Reference Thomasen, Sørensen, Lund and Guldbrandtsen2012).

However, the limited availability of individuals for genomic reference populations is not only the problem of numerically small populations. Problems also arise whenever traits are hard to measure, for example if they arise later in life, are visible in one sex only or no routine recording exist yet. Several studies on the use of cow reference populations exist to investigate the effect of indicator traits for scarcely recorded traits (de Haas et al., Reference Gernand, Rehbaein, von Borstel and König2011; Pszczola et al., Reference Pszczola, Mulder and Calus2013), expensive to measure traits like residual feed intake (Pryce et al., Reference Phillipson and Lindhe2012b) or progesterone-based fertility traits (Berry et al., Reference Berry, Bastiaansen, Veerkamp, Wijga, Wall, Berglund and Calus2012), and newly established traits (e.g. direct health traits, Egger-Danner et al., Reference de Haas, Windig, Calus, Dijkstra, de Haan, Bannink and Veerkamp2014).

Using a deterministic approach, Buch et al. (Reference Buch, Kargo, Berg, Lassen and Sørensen2012) showed that the scarcer the phenotype data, the larger the effect of adding cows to the reference population. Simulation studies have shown the large value of genotyped cows for new traits, since it enables genomic predictions that can reach reasonable accuracies. Accordingly, it accelerates the availability of genomic estimated breeding values (GEBV) and thus enables the selection for new traits (Buch et al., Reference Buch, Kargo, Berg, Lassen and Sørensen2012; Calus et al., Reference Calus, Huang, Wientjes, ten Napel, Bastiaansen, Price, Veerkamp, Vereijken and Windig2012; Egger-Danner et al., Reference de Haas, Windig, Calus, Dijkstra, de Haan, Bannink and Veerkamp2014). Egger-Danner et al. (Reference de Haas, Windig, Calus, Dijkstra, de Haan, Bannink and Veerkamp2014) demonstrated that genomic estimated breeding values for direct health traits become available when cows with reliable phenotypes are genotyped. Generally speaking, in case of limited resources, for example, due to small population size or rare phenotypes, it is more efficient to genotype females instead of males only (Buch et al., Reference Buch, Kargo, Berg, Lassen and Sørensen2012; Pszczola et al., Reference Pszczola, Mulder and Calus2013; Gonzalez-Recio et al., Reference Falconer and Mackay2014). Thomasen et al. (Reference Thomasen, Egger-Danner, Willam, Guldbrandtsen, Lund and Sørensen2014b) demonstrated that the inclusion of cows into reference populations is a solution to increase the competitiveness of small dairy populations.

The achievable gain in genetic response when using a cow reference population and the required number of cows to reach this gain strongly depends on the economic value of the traits, their correlation with the index (Calus et al., Reference Calus, Huang, Wientjes, ten Napel, Bastiaansen, Price, Veerkamp, Vereijken and Windig2012), and their heritability (Buch et al., Reference Buch, Kargo, Berg, Lassen and Sørensen2012; Calus et al., Reference Calus, Huang, Wientjes, ten Napel, Bastiaansen, Price, Veerkamp, Vereijken and Windig2012). Real data studies have confirmed the benefit of genotyped cows for the accuracy of genomic prediction (Calus et al., Reference Calus, de Haas and Veerkamp2013; Lourenco et al., Reference König and Swalve2014a). The level of this gain depends on the proportion of cows and bulls in the reference population. Calus et al. (Reference Calus, de Haas and Veerkamp2013) reported an increase in accuracy of 4% to 9% when using a reference population of 1609 cows instead of 296 bulls and an increase of another 1% to 5% when using the combined population. Lourenco et al. (Reference König and Swalve2014a) observed a positive but very small effect of 1% to 2% when including 343 elite cows into a reference population of 1305 bulls.

Selection of informative animals

The inclusion of elite females is a critical aspect when it comes to the selection of the most beneficial individuals from the female population. Elite cows will be selected when genotyping costs are comparably high. However, preferential treatment of bull dams can result in over-estimated genomic predictions; hence, a specific treatment of these cow records is advisable (Dassonneville et al., Reference Dassonneville, Baur, Fritz, Boichard and Ducrocq2012). Even though the results published by Lourenco et al. (Reference König and Swalve2014a) negate the adverse effects on evaluation accuracy when including elite cows, the selection strategy for female candidates should be to maximize the phenotypic variance that can be captured by these cows. Jiménez-Montero et al. (Reference Hickey, Kinghorn, Tier, van der Werf and Cleveland2012) have tested different female genotyping strategies in terms of genomic prediction accuracy. The authors suggest a two-tailed strategy, in which females with lower and upper extreme values within the yield deviation distribution should be genotyped, especially for small population sizes. In general, the double counting of female information should be avoided, since the simultaneous inclusion of the cow’s own milk performance and her sire’s daughter yield deviations may lead to biased results (Calus et al., Reference Calus, de Haas and Veerkamp2013).

As described by Pszczola et al. (Reference Pryce, Arias, Bowman, Davis, MacDonald, Waghorn, Wales, Williams, Spelman and Hayes2012), the minimization of the relationship between animals in the reference population combined with the maximization of the relationship between animals in the reference and validation population are required for the maximization of genomic prediction accuracy. Even reference populations of randomly selected individuals performed well in some studies (Pszczola et al., Reference Pryce, Arias, Bowman, Davis, MacDonald, Waghorn, Wales, Williams, Spelman and Hayes2012). Since the reliability of genomic prediction decreases as the genetic distance between reference population and selection candidates increases (Habier et al., Reference Goddard2010), reference populations need to be continuously updated to avoid a decline in relationships between the reference and the predicted populations and thus a decrease in accuracy of genomic prediction (Pryce et al., Reference Pimentel, Wensch-Dorendorf, König and Swalve2012a). Pszczola et al. (Reference Pszczola, Strabel, Mulder and Calus2014) calculated a loss of accuracy per generation of 7% in a simulated reference population of 2000 cows in which no animals were added over the years. If the goal is to maintain the original level of accuracy, it is necessary to include new animals, whereby the selection of new individuals to enter the reference population is apparently more important than their actual quantity (Pszczola et al., Reference Pszczola, Strabel, Mulder and Calus2014). The opposite situation, removing data from former generations, is a practicable way to reduce computation requirements without loss in reliability. However, this can be different, and thus problematic, in small populations, especially if multi-breed reference populations are applied. Then phenotypic records of additional generations may result in a reliability increase for the one breed and in a decrease for the other (Lourenco et al., 2014b).

Additional efforts when adding cows to the reference population

In general, cows are less informative compared to males due to the lower reliability of EBV serving as phenotypes. Depending on the heritability of the trait, 3 to 10 genotyped females are necessary to replace one genotyped bull (Boichard et al., Reference Boichard, Ducrocq and Fritz2015). From an economical point of view, cow reference populations are preferred if the heritability of the trait as well as the costs of phenotyping are high (>few hundred dollars per cow) (Gonzalez-Recio et al., Reference Falconer and Mackay2014). For all other cases, Gonzalez-Recio et al. (Reference Falconer and Mackay2014) suggest a reference population of a relatively high number of genotyped sires with small progeny group sizes (e.g. 20 equivalent daughters). However, small or medium-size populations suffer from a limited number of males and may not be able to fulfill these conditions. For many regional breeds the male reference populations are critically small in numbers (Boichard et al., Reference Boichard, Ducrocq and Fritz2015). Additionally, a closely related breed will not always be available, or the conservation of genetic originality of the individual breed is the predominant goal. Then the inclusion of females into the reference population is the only way to assemble a reference population of sufficient size. Since the relative costs are disproportionately high in small populations compared with larger populations, other kinds of profitable utilization of genomic prediction information can justify and compensate the economic effort. The possibilities of inbreeding control, improved herd management or evaluation of new traits are discussed in details in later parts of this paper. Additionally, the recording, analysis, and genetic evaluation of complete contemporary groups of cows are of distinct value, especially if health traits are in the focus of interest.

Currently, especially in the Holstein population, a strong trend for massive genotyping of females on the initiative of farmers as well as breeding organizations can be observed. First, care will have to be taken whether these samples of cows will provide unselected samples. If this is questionable, it may help to extract parts of the data to fulfill the requirement of randomness. Second, these samples will be useful for standard traits but again the question arises where new phenotypes for functional traits, health traits, as well as other new traits of future interest will come from. Small populations should seek to implement their own programs for female genotyping considering the unbiasedness of the sample as well as the possibility for the recording of new traits. Additionally, the massive efforts of large populations for genotyping of females additionally could be exploited if causal variants are identified on a larger scale.

In conclusion, the inclusion of females in reference population can be a good strategic choice, especially when population size is small or traits are expensive to measure. However, the additional effort and the additional gain have to be balanced diligently in advance and the selection of informative cows needs to be well considered.

Imputation of un-genotyped animals

The use of un-genotyped animals to be included in the reference population can be performed via imputation as well as by incorporating these animals in the matrix of relationships of a single-step procedure for the prediction of genomic breeding values. However, as Pszczola et al. (Reference Pryce, Hayes and Goddard2011) have pointed out, a gain in accuracy of genomic breeding values can only be achieved if genotypes for un-genotyped animals can be predicted with a high accuracy. With the implementation of genomic selection, the imputation of genotypes became a routine application. Usually, imputation serves to assign the genotype in case single SNP alleles have not been called for during the technical genotyping process and thus are missing. Additionally, the imputation from genotypes at low-density chips to genotypes at high-density chips (Daetwyler et al., Reference Daetwyler, Kemper, van der Werf and Hayes2012; Erbe et al., Reference de Roos, Hayes, Spelman and Goddard2012) or recently from high-density chips to whole genome information (BrØndum et al., Reference Brøndum, Su, Janss, Sahana, Guldbrandtsen, Boichard and Lund2015) has become of interest. Based on the existence of high LD between close markers, a genotyped marker allows for inference of the genotype at a nearby un-genotyped locus. Accordingly, the imputation of genotypes at un-genotyped loci from low-density chips to high-density chips is feasible (Daetwyler et al., Reference Daetwyler, Kemper, van der Werf and Hayes2012; Hickey et al., Reference Habier, Tetens, Seefried, Lichtner and Thaller2012).

Another application of imputation, usually referred to as population-based imputation, is the so-called pedigree-based imputation (Pimentel et al., Reference Parker Gaddis, Cole, Clay and Maltecca2013). Large half-sib families and sires with large number of progeny characterize the population structure in dairy cattle. These circumstances allow inferring of the genotype of an un-genotyped animal with the help of the genotype information from its relatives. Imputation of completely un-genotyped animals has the potential to enlarge the reference population for genomic selection and thus to increase the reliability of genomic prediction (Hickey et al., Reference Habier, Tetens, Seefried, Lichtner and Thaller2012; Pimentel et al., Reference Parker Gaddis, Cole, Clay and Maltecca2013).

The imputation of un-genotyped individuals is particularly beneficial if the phenotype of an individual exists but the genotype does not. For instance, this is the case for most cows or for historical data sets, where DNA samples are no longer available. To pursue the pedigree-based imputation strategy with un-genotyped individuals requires a set of closely related genotyped animals. Pimentel et al. (Reference Parker Gaddis, Cole, Clay and Maltecca2013) derived an algorithm to impute un-genotyped dams combining genotype information from the sire of each dam, one offspring, and the offspring’s sire. The addition of these dams to the reference population resulted in a significant increase of the genomic predictions accuracies (up to 37.2%). This method revealed to be particularly beneficial for populations with lower LD level, for low heritability traits, and for species with a limited reference population size (Pimentel et al., Reference Parker Gaddis, Cole, Clay and Maltecca2013). Hickey et al. (Reference Habier, Tetens, Seefried, Lichtner and Thaller2012) developed an imputation method based on segregation analyses, phasing rules, long-range phasing and haplotype library information and found it to be an efficient and accurate method for data sets including pedigrees up to 25 000 animals. Integrating this information into genomic prediction resulted in increased accuracies of GEBV. Bouwman et al. (Reference Bouwman, Hickey, Calus and Veerkamp2014) analyzed different scenarios of imputation of un-genotyped animals using different combinations of relative information and different imputation settings with v. without phasing. The study revealed that applying basic inheritance rules and the use of segregation analyses would be favorable. The existence of genotyped offspring in addition to sire and maternal grandsire genotype information showed to be especially helpful. However, specific imputation algorithms for un-genotyped individuals using LD and pedigree information are required (Bouwman et al., Reference Bouwman, Hickey, Calus and Veerkamp2014).

Additional aspects for small populations

The implementation of genomic selection focuses on direct genomic prediction from a reference set consisting of related individuals. Since costs of genomic breeding schemes are disproportionately high in small populations, additional applications are important for a profitable use of genomics. Genomic selection provides the chance to integrate new phenotypes into breeding programs. This applies especially to the integration of phenotypes that can be recorded on a small scale only. As reviewed by Boichard and Brochard (Reference Boichard and Brochard2012), these new traits will likely be related to disease resistance, feed efficiency, milk composition, or adaptability to environment.

The practical use of genotyping young heifers becomes more and more important as genotyping costs decrease. Female GEBV can be used to intensify the selection on the female pathway. Pryce et al. (Reference Pimentel, Wensch-Dorendorf, König and Swalve2012a) showed the potential to identify elite females that can either be sold at good prices, or selected for embryo transfer and the marketing of embryos. Quite naturally, one way of making use of genomic breeding values would be their use in selection of replacements within herd. Unfortunately, for many herds, culling rates are relatively high and thus there is limited room for selection among replacements (Pryce et al., Reference Pimentel, Wensch-Dorendorf, König and Swalve2012a). In addition to efficient within-herd selection, GEBV of young as well as older females allows for improvements in the planning of matings.

Another positive use is the verification of parents. Once a calf is genotyped, it can be assigned easily to its sire as well as to its dam, provided the parents are genotyped (Pryce et al., Reference Pimentel, Wensch-Dorendorf, König and Swalve2012a). This might be especially beneficial in low-input housing systems where calves are born unobserved. An additional advantage, especially for small populations, arises from the improved management of genetic variability. The genetic management of populations of small size or a small effective population size should not only focus on genetic gain. Further conflicting goals are genetic diversity and for genetically unique breeds the preservation of the original genetic make-up of the population. Genomic information also helps to improve understanding of the population structure in terms of migration rates. Thus, genetic drift and the resulting loss of genetic diversity become somewhat controllable, if the specific population is monitored routinely. Boichard et al. (Reference Boichard, Ducrocq and Fritz2015) illustrated the potential of genomic information to increase genetic variance by targeted selection of rare variants. Since favorable but rare alleles can get lost due to genetic drift, the allocation of higher weights to these alleles could increase their frequency.

The challenges of phenotyping

Genomic selection has the potential to use new phenotypes (Boichard and Brochard, Reference Boichard and Brochard2012; Calus et al., Reference Calus, Huang, Wientjes, ten Napel, Bastiaansen, Price, Veerkamp, Vereijken and Windig2012). However, the quality and quantity of these phenotypes remains a critical issue (Boichard and Brochard, Reference Boichard and Brochard2012). The following sections deal with aspects of precise phenotyping, present novel and changed phenotypes in dairy cattle breeding and discuss their potential to enter in genomic selection breeding schemes.

New traits: new challenges

The potential for new traits in modern dairy cattle breeding programs is diverse as reviewed by Boichard and Brochard (Reference Boichard and Brochard2012) and Egger-Danner et al. (Reference de LosCampos, Vazquez, Fernando, Klimentidis and Sorensen2015). There are at least three essential sources for new traits. First, many new traits potentially already exist (Boichard and Brochard, Reference Boichard and Brochard2012), but they are not used for breeding purposes yet, such as information stemming from herd management systems, or routine milk recording. Second, pre-existing traits need to be defined in a new way. Third, completely new traits often requires the establishment for a new recording system from scratch. The key aspect for all traits that are supposed to be implemented into breeding programs is data quality. Precise evaluation, standardized data management, and centralized data analyses are crucial points for high quality phenotypes.

Strategies for phenotyping

The recording of numerous traits has a long history in dairy cattle breeding. One of the fundamental principles of milk recording has been the recording of entire herds at a specific point in time. This principle statistically enables us to work with defined contemporary groups and has been very successful for genetic evaluations. Hence, the same principle should also apply to other ways of collecting data as well, that is the formation of suitable contemporary groups has to be the ultimate goal. For health traits, or rather for the recording of disease events, this principle has often been neglected as individual disease events are collected as they flow in. However, diseases may be undetected and thus not recorded. Rather, an assessment of the health status for the entire herd at a given point in time for every disease or disorder of interest would be optimal. The outcome would be an exact assessment of the disease status for every animal. This ultimate goal, however, may not be achievable or be too costly to be practically implemented. For individual diseases, however, such a recording may be feasible. An example is the recording of hoof disorders at the time of hoof trimming of the entire herd, which yields contemporary groups comparable to those in milk recording and thus the data is highly suitable for statistical analysis (e.g. Schöpke et al., 2013).

Many recording schemes for traits desirable to be recorded may not be feasible to be implemented on a large scale or for an entire population. In this case, a natural way would be to implement such recording schemes for cooperating herds, also called contract herds. In New Zealand, contract herds have been used for sire progeny testing schemes since 1960 (LIC, Reference Kemper and Goddard2015) and have repeatedly been suggested for use in genomic breeding programs as a basis for collecting precise phenotypes (e.g. König and Swalve, Reference Karoui, Carabaño, Díaz and Legarra2009). Contract herds are contracted for the recording of standard as well as additional traits. Additional traits could be based on technology implemented in those herds, for example bio-markers in milk, 3D images of the body or applying the principle to form proper contemporary groups as done in recording of hoof diseases at time of trimming. A key feature of contract herds is supervision such that documentation and recording on-farm is carried out to highest standards. This especially applies to all kinds of recordings of health traits. Other traits of interest that do not require a high standard of technology are weights of the animals. Birth weights will be a valuable source of data to supplement recording of calving ease and still birth, and small reference populations with thorough data recording can serve to develop predictors for the broad population with the help of indicator traits (Cole et al., Reference Cole, Waurich, Wensch-Dorendorf, Bickhart and Swalve2014). It has to be noted though, that this should also include weighing of stillborn calves. Weights of replacement females and mature cows can aid in genetic evaluations for energy balance and metabolic disorders.

Health traits: well-structured recording required

The issue of health traits is a topic of current interest that bears chances and challenges at the same time. At present, the main challenge is the availability of appropriate data. The first potential suppliers for direct health information are veterinarians. They can provide a valuable continuous documentation on diagnoses, treatments and prescriptions. This continuous documentation is urgent for the implementation of direct health data into breeding programs, but it assumes standardization and centralization (Gernand et al., Reference Egger-Danner, Cole, Pryce, Gengler, Heringstad, Bradley and Stock2012; ICAR, Reference Hayes, Bowman, Chamberlain, Verbyla and Goddard2012). The earliest documentation for health traits in dairy cattle has been established in the Nordic countries since the 1970s (ICAR, Reference Hayes, Bowman, Chamberlain, Verbyla and Goddard2012), and these data have been included in national evaluation systems (Phillipson and Lindhé, Reference Olson, Vanraden and Tooker2003). As reviewed by Egger-Danner et al. (Reference de LosCampos, Vazquez, Fernando, Klimentidis and Sorensen2015), routine genetic evaluations for direct health traits also exist in Austria (since 2010), in Germany (since 2010), in France (since 2012) and in Canada (since 2013). Further possible sources of information are farmers or expert groups, for instance claw trimmer or nutritionists. Data can further arise from laboratories or on-farm equipment (ICAR, Reference Hayes, Bowman, Chamberlain, Verbyla and Goddard2012). Egger-Danner et al. (Reference de LosCampos, Vazquez, Fernando, Klimentidis and Sorensen2015) emphasize the motivation of the involved stakeholder to be a key requirement for successful recording.

Besides the recording of direct health data, the use of indirect records is a possible solution. Observations of somatic cell count, body condition score, conformation scores, etc., can serve as indicator traits for the animal’s health. However, the application of direct health traits is more convenient from a cost perspective than the use of indicator traits (Parker Gaddis et al., Reference Nyman, Lindberg and Sandgren2014). A general consideration for the collection of health data is whether to diagnose only affected cows or to record whole cohorts of animals. Including complete contemporary groups has the benefit that affected and non-affected cows contribute to the later analysis, which is particularly advantageous for association studies as shown by Swalve et al. (Reference Sved2014). An example for an improved definition of an existing trait is the claw disorder dermatitis digitalis. A study conducted by Schöpke et al. (Reference Schenkel, Sargolzaei, Kistemaker, Jansen, Sullivan, VanDoormaal, Vanraden and Wiggans2015) showed that using a more sophisticated DD trait definition resulted in higher estimates of heritability than known from previous studies and delivers an alert tool for practical purposes.

In addition to the aforementioned use of health record within national evaluation systems, there are other valuable applications of health data. On the farm level, individual farm health records are helpful information for farmers and veterinarians and they can help to optimize the herd management and thus improve the herd health status (ICAR, Reference Hayes, Bowman, Chamberlain, Verbyla and Goddard2012). The monitoring of the health status on a population level might be interesting for ministries and consumers that are concerned about the status of food safety and animal welfare.

For the main livestock species, a Welfare Quality Assessment protocol has been developed that provides a description of assessment procedures (Welfare Quality, 2009). For dairy cattle, some of the therein-defined welfare indicators are well-known herd management traits, such as somatic cell count or body condition score, or traits, which can be found in health documentation (e.g. lameness, vulva discharge, diarrhea, ocular discharge). Routinely collected herd data have been shown to allow a prescreening for the discovery of herds having welfare problems (de Vries et al., Reference VanRaden, Olson, Null, Sargolzaei, Winters and van Kaam2014). Accordingly, routinely collected herd data can improve the classification of herd welfare (Nyman et al., Reference Lund, Su, Janss, Guldbrandtsen and Brøndum2011) and facilitate the time consuming and, therefore, costly on-farm assessment of the welfare indicators.

Consequently, accurate and continuous health data recording supports the individual farm management, genetic evaluations, and the current demands on sustainable and cost-efficient milk production in accordance with public expectations.

Conclusion

In the genomic era, the distinct problem of limited reference information arises for numerically small populations, as well as for hard to measure traits. To overcome the problem of limited information, several strategies exist. The use of joint reference populations can be a beneficial solution. In this respect, the formal belonging to a country or breed is not essential. In fact, the consistency of LD across the combined populations plays the important role, which is more often provided for reference populations existing of individuals of the same breed but from different countries, rather than in multi-breed reference populations. Especially in small reference populations, the family relationship level is of distinct importance. Initiating a multi-population reference set is often beneficial for small populations, but it requires detailed knowledge of family relationships, allele frequencies, and LD patterns to design a sophisticated make-up. The inclusion of females into reference populations and the imputation of un-genotyped individuals with the help of genotyped relatives are promising strategies if the selection of animals is well-considered. A combination of both approaches may be suggested. Improved exploitation of reference populations created with high costs, for example through selection on the herd-level, individual mating plans, conservation strategies, inbreeding control, or parentage verification is recommended.

Acknowledgement

The authors gratefully acknowledge the work of two anonymous reviewers for critically reading the manuscript and suggesting substantial improvements.

References

Berry, DP, Bastiaansen, JWM, Veerkamp, RF, Wijga, S, Wall, E, Berglund, B and Calus, MPL 2012. Genome-wide associations for fertility traits in Holstein–Friesian dairy cows using data from experimental research herds in four European countries. Animal 6, 1206–1215.CrossRef Google Scholar PubMed

Boichard, D and Brochard, M 2012. New phenotypes for new breeding goals in dairy cattle. Animal 6, 544–550.Google Scholar

Boichard, D, Ducrocq, V and Fritz, S 2015. Sustainable dairy cattle selection in the genomic era. Journal of Animal Breeding and Genetics 132, 135–143.CrossRef Google Scholar PubMed

Bouwman, AC, Hickey, JM, Calus, MPL and Veerkamp, RF 2014. Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle. Genetic Selection Evolution 46, 6–17.Google Scholar

Brøndum, RF, Rius-Vilarrasa, E, Strandén, I, Su, G, Guldbrandtsen, B, Fikse, WF and Lund, MS 2011. Reliabilities of genomic prediction using combined reference data of the Nordic Reddairy cattle populations. Journal of Dairy Science 94, 4700–4707.Google Scholar

Brøndum, RF, Su, G, Janss, L, Sahana, G, Guldbrandtsen, B, Boichard, D and Lund, MS 2015. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. Journal of Dairy Science 98, 4107–4116.Google Scholar

Buch, LH, Kargo, M, Berg, P, Lassen, J and Sørensen, AC 2012. The value of cows in reference populations for genomic selection of new functional traits. Animal 6, 880–886.CrossRef Google Scholar PubMed

Calus, MPL, de Haas, Y and Veerkamp, RF 2013. Combining cow and bull reference populations to increase accuracy of genomic prediction and genome-wide association studies. Journal of Dairy Science 96, 6703–6715.CrossRef Google Scholar PubMed

Calus, MPL, de Haas, Y, Pszczola, M and Veerkamp, RF 2012. Predicted accuracy of and response to genomic selection for new traits in dairy cattle. Animal 7, 183–191.CrossRef Google Scholar PubMed

Calus, MPL, Huang, H, Wientjes, YCJ, ten Napel, J, Bastiaansen, JWM, Price, MD, Veerkamp, RF, Vereijken, A and Windig, JJ 2014. (A)cross-breed Genomic Prediction. In: Proceedings of 10th WCGALP2014. 17 to 22 August 2014, Vancouver, BA, Canada.Google Scholar

Clark, SA, Hickey, JM, Daetwyler, HD and van der Werf, HJH 2012. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genetics Selection Evolution 44, 4–13.Google Scholar

Cole, JB, Waurich, B, Wensch-Dorendorf, M, Bickhart, DM and Swalve, HH 2014. A genome-wide association study of calf birth weight in Holstein cattle using single nucleotide polymorphisms and phenotypes predicted from auxiliary traits. Journal of Dairy Science 97, 3156–3172.Google Scholar

Daetwyler, HD, Kemper, KE, van der Werf, JHJ and Hayes, BJ 2012. Components of the accuracy of genomic prediction in a multi-breed sheep population. Journal of Animal Science 90, 3375–3384.CrossRef Google Scholar

Dassonneville, R, Baur, A, Fritz, S, Boichard, D and Ducrocq, V 2012. Inclusion of cow records in genomic evaluations and impact on bias due to preferential treatment. Genetics Selection Evolution 44, 40–48.CrossRef Google Scholar PubMed

de Haas, Y, Windig, JJ, Calus, MPL, Dijkstra, J, de Haan, M, Bannink, A and Veerkamp, RF 2011. Genetic parameters for predicted methane production and potential for reducing enteric emissions through genomic selection. Journal of Dairy Science 94, 6122–6134.Google Scholar

de LosCampos, G, Vazquez, AI, Fernando, R, Klimentidis, YC and Sorensen, D 2013. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genetics 9, 1–15.Google Scholar

de Roos, APW, Hayes, BJ, Spelman, RJ and Goddard, ME 2008. Linkage disequilibrium and persistence of phase in Holstein–Friesian, Jersey and Angus cattle. Genetics 179, 1503–1512.CrossRef Google Scholar PubMed

de Vries, M, Bokkers, EAM, van Schaik, G, Engel, B, Dijkstra, T and de Boer, IJM 2014. Exploring the value of routinely collected herd data for estimating dairy cattle welfare. Journal of Dairy Science 97, 715–730.CrossRef Google Scholar PubMed

Egger-Danner, C, Cole, JB, Pryce, JE, Gengler, N., Heringstad, B, Bradley, A and Stock, KF 2015. Invited review: overview of new traits and phenotyping strategies in dairy cattle with a focus on functional traits. Animal 9, 191–207.Google Scholar

Egger-Danner, C, Schwarzenbacher, H and Willam, A 2014. Short communication: genotyping of cows to speed up availability of genomic estimated breeding values for direct health traits in Austrian Fleckvieh (Simmental) cattle-genetic and economic aspects. Journal of Dairy Science 97, 4552–6455.Google Scholar

Erbe, M, Hayes, BJ, Matukumalli, LK, Goswami, S, Bowman, PJ, Reich, CM, Mason, B and Goddard, ME 2012. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of DairyScience 95, 4114–4129.Google Scholar

Falconer, DS and Mackay, TFC 1996. Introduction to quantitative genetics, 4th edition. Pearson, Harlow, England.Google Scholar

Gernand, E, Rehbaein, P, von Borstel, UU and König, S 2012. Incidences of and genetic parameters for mastitis, claw disorders, and common health traits recorded in dairy cattle contract herds. Journal of Dairy Science 95, 2144–2156.CrossRef Google Scholar PubMed

Goddard, M 2009. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245–257.Google Scholar

Goddard, ME and Hayes, BJ 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10, 381–391.Google Scholar

Gonzalez-Recio, O, Coffey, MP and Pryce, JE 2014. On the value of the phenotypes in the genomic era. Journal of Dairy Science 97, 7905–7915.CrossRef Google Scholar PubMed

Habier, D, Tetens, J, Seefried, F, Lichtner, P and Thaller, G 2010. The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genetics Selection Evolution 42, 5–17.Google Scholar

Haile-Mariam, M, Pryce, JE, Schrooten, C and Hayes, BJ 2015. Including overseas performance information in genomic evaluations of Australian dairy cattle. Journal of Dairy Science 98, 1–17.Google Scholar

Hayes, BJ, Bowman, PJ, Chamberlain, AC, Verbyla, K and Goddard, ME 2009. Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genetics Selection Evolution 41, 51–60.Google Scholar

Hickey, JM, Kinghorn, BP, Tier, B, van der Werf, JHJ and Cleveland, MA 2012. A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation. Genetics Selection Evolution 44, 9–20.Google Scholar

Hozé, C, Fritz, S, Phocas, F, Boichard, D, Ducrocq, V and Croiseau, P 2014. Efficiency of multi-breed genomic selection for dairy cattle breeds with different sizes of reference population. Journal of Dairy Science 97, 3918–3929.Google Scholar

ICAR 2012. ICAR guidelines for recording, evaluation and genetic improvement of health traits. Retrieved February 12, 2015, from http://www.icar.org/Documents/Rules%20and%20regulations/ Amendments%202012/ Recording,%20Evaluation%20and%20Genetic%20Improvement%20of%20health%20traits.pdf.Google Scholar

Jiménez-Montero, JA, González-Recio, O and Alenda, R 2012. Genotyping strategies for genomic selection in small dairy cattle populations. Animal 6, 1216–1224.Google Scholar

Kariuki, CM, Komen, H, Kahi, AK and van Arendonk, JAM 2014. Optimizing the design of small-sized nucleus breeding programs for dairy cattle with minimal performance recording. Journal of Dairy Science 97, 7963–7974.Google Scholar

Karoui, S, Carabaño, MJ, Díaz, C and Legarra, A 2012. Joint genomic evaluation of French dairy cattle breeds using multiple-trait models. Genetics Selection Evolution 44, 39–49.Google Scholar

Kemper, KE and Goddard, ME 2012. Understanding and predicting complex traits: knowledge from cattle. Human Molecular Genetics 21, R45–R51.Google Scholar

Kemper, K, Hayes, BJ, Daetwyler, HD and Goddard, ME 2015. How old are quantitative trait loci and how widely do they segregate? Animal Breeding and Genetics 132, 121–134.Google Scholar

König, S and Swalve, HH 2009. Application of selection index calculations to determine selection strategies in genomic breeding programs. Journal of Dairy Science 92, 5292–5303.Google Scholar

LIC 2015. LIC – Livestock Improvement Company. History 1941–1969. Retrieved April 7, 2015, from http://www.lic.co.nz/lic_Historical_Info.cfm?lid=16.Google Scholar

Lourenco, DAL, Misztal, I, Tsuruta, S, Aguilar, I, Ezra, E, Ron, M, Shirak, A and Weller, JI 2014a. Methods for genomic evaluation of a relatively small genotyped dairy population and effect of genotyped cow information in multiparity analyses. Journal of Dairy Science 97, 1742–1752.Google Scholar

Lourenco, DAL, Misztal, I, Tsuruta, S, Aguilar, I, Lawlor, TJ, Forni, S and Weller, JI 2014b. Are evaluations on young genotyped animals benefiting from the past generations? Journal of Dairy Science 97, 3930–3942.Google Scholar

Lund, MS, de Roos, APW, de Vries, AG, Druet, T, Ducrocq, V, Fritz, S, Guillaume, F, Guldbrandtsen, B, Liu, ZT, Reents, R, Schrooten, C, Seefried, F and Su, GS 2011. A common reference population from four European Holstein populations increases reliability of genomic predictions. Genetics Selection Evolution 43, 43–51.Google Scholar

Lund, MS, de Roos, APW, de Vries, AG, Druet, T, Ducrocq, V, Guillaume, F, Guldbrandtsen, B, Liu, Z, Reents, R, Schrooten, C, Seefried, M and Su, G 2010. Improving genomic prediction by Euro-Genomics collaboration. In Proceedings of 9th WCGALP 2010, August 1 to 6, 2010, Leipzig, Germany. pp. 7–10.Google Scholar

Lund, MS, Su, G, Janss, L, Guldbrandtsen, B and Brøndum, RF 2014. Genomic evaluation of cattle in a multi-breed context. Livestock Science 166, 101–110.Google Scholar

MacLeod, IM, Hayes, BJ and Goddard, ME 2014. The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data. Genetics 198, 1671–1684.Google Scholar

Nyman, AK, Lindberg, A and Sandgren, CH 2011. Can pre-collected register data be used to identify dairy herds with good cattle welfare? Acta Vetarinaria Scandinavica 53 (Suppl 1), 8–14.Google Scholar

Olson, KM, Vanraden, PM and Tooker, ME 2012. Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss. Journal of Dairy Science 95, 5378–5383.Google Scholar

Parker Gaddis, KL, Cole, JB, Clay, JS and Maltecca, C 2014. Genomic selection for producer-recorded health event data in US dairy cattle. Journal of Dairy Science 97, 3190–3199.Google Scholar

Phillipson, J and Lindhe, B 2003. Experiences of including reproduction and health traits in Scandinavian dairy cattle breeding programs. Livestock Production Science 83, 99–112.Google Scholar

Pimentel, ECG, Wensch-Dorendorf, M, König, S and Swalve, HH 2013. Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture. Genetics Selection Evolution 45, 12–24.Google Scholar

Pryce, JE, Hayes, BJ and Goddard, M 2012a. Genotyping dairy females can improve the reliability of genomic selection and provide farmers with new management tools. In Proceedings of 38th ICAR Biennial Session. 28 May to 1 June 2012, Cork, Ireland.Google Scholar

Pryce, JE, Arias, J, Bowman, PJ, Davis, SR, MacDonald, KA, Waghorn, GC, Wales, WJ, Williams, JY, Spelman, RJ and Hayes, BJ 2012b. Accuracy of genomic predictions of residual feed intake and 250-day body weight in growing heifers using 625,000 single nucleotide polymorphism markers. Journal of Dairy Science 95, 2108–2119.Google Scholar

Pszczola, M, Mulder, HA and Calus, MPL 2011. Effect of enlarging the reference population with (un)genotyped animals on the accuracy of genomic selection in dairy cattle. Journal of Dairy Science 94, 431–441.Google Scholar

Pszczola, M, Strabel, T, Mulder, HA and Calus, MPL 2012. Reliability of direct genomic values for animals with different relationships within and to the reference population. Journal of Dairy Science 95, 5412–5421.CrossRef Google Scholar

Pszczola, M, Veerkamp, RF, de Haas, Y, Wall, E, Strabel, T and Calus, MPL 2013. Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal 7, 1759–1768.Google Scholar

Pszczola, M, Strabel, T and Calus, MPL 2014. Size of required reference population updates to achieve constant genomic prediction accuracy across generations. In: Proceedings of 10th WCGALP2014. 17 to 22 August 2014, Vancouver, BA, Canada.Google Scholar

Schaeffer, LR 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics 123, 218–223.Google Scholar

Schenkel, F, Sargolzaei, M, Kistemaker, G, Jansen, G, Sullivan, P, VanDoormaal, BJ, Vanraden, PM and Wiggans, GR 2009. Reliability of genomic evaluation of Holstein cattle in Canada. Interbull Bulletin 39, 51–58.Google Scholar

Schöpke, K, Weidling, S, Pijl, R and Swalve, HH 2013. Relationships between bovine hoof disorders, body condition traits and test-day yields. Journal of Dairy Science 96, 679–689.Google Scholar

Schöpke, K, Gomez, A, Dunbar, KA, Swalve, HH and Döpfer, D 2015. Investigating the genetic background of bovine digital dermatitis using improved definitions of clinical status. Journal of Dairy Science 98, 8164–8174.CrossRef Google Scholar PubMed

Simeone, R, Misztal, I, Aguilar, I and Vitezica, ZG 2012. Evaluation of a multi-line broiler chicken population using single-step genomic evaluation procedure. Journal of Animal Breeding and Genetics 129, 3–10.Google Scholar

Smith, JM and Haigh, J 1974. The hitch-hiking effect of a favourable gene. Genetics Research 23, 23–35.CrossRef Google Scholar PubMed

Sved, JA 1971. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theoretical Population Biology 2, 125–141.Google Scholar

Swalve, HH, Floren, C, Wensch-Dorendorf, M, Schöpke, K, Pijl, R, Wimmers, K and Brenig, B 2014. A study based on records taken at time of hoof trimming reveals a strong association between the IQ motif-containing GTPase-activating protein 1 (IQGAP1) gene and sole hemorrhage in Holstein cattle. Journal of Dairy Science 97, 507–519.Google Scholar

Thomasen, JR, Egger-Danner, C, Willam, A, Guldbrandtsen, B, Lund, MS and Sørensen, AC 2014a. Genomic selection strategies in a small dairy cattle population evaluated for genetic gain and profit. Journal of Dairy Science 97, 458–470.CrossRef Google Scholar

Thomasen, JR, Sørensen, AC, Lund, MS and Guldbrandtsen, B 2014b. Adding cows to the reference population makes a small dairy cattle population competitive. Journal of Dairy Science 97, 5822–5832.Google Scholar

VanRaden, PM, Olson, KM, Null, DJ, Sargolzaei, M, Winters, M and van Kaam, JBCHM 2012. Reliability increases from combining 50,000- and 777,000-marker genotypes from four countries. Interbull Bulletin 46, 75–79.Google Scholar

Welfare Quality Assessment Protocol for Cattle 2009. Welfare Quality® Consortium. Lelystad, the Netherlands.Google Scholar

Wientjes, YCJ, Calus, MPL, Goddard, ME and Hayes, BJ 2014. Effect of genetic architecture of multi breed genomic prediction. In Proceedings of 10th WCGALP, 17 to 22 August 2014, Vancouver, BA, Canada.Google Scholar

Wientjes, YCJ, Veerkamp, RF and Calus, MPL 2013. The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction. Genetics 193, 621–631.Google Scholar

Wiggans, GR, Su, G, Cooper, TA, Nielsen, US, Aamand, GP, Guldbrandtsen, B, Lund, MS and VanRaden, MP 2015. Short communication: improving accuracy of Jersey genomic evaluations in the United States and Denmark by sharing reference population bulls. Journal of Dairy Science 98, 1–6.Google Scholar

Zhou, L, Ding, X, Zhang, Q, Wang, Y, Lund, MS and Su, G 2013. Consistency of linkage disequilibrium between Chinese and Nordic Holsteins and genomic prediction for Chinese Holsteins using a joint reference population. Genetics Selection Evolution 45, 7–14.Google Scholar

Zhou, L, Heringstad, B, Su, G, Guldbrandtsen, B, Meuwissen, T, Svendsen, M, Grove, H, Nielsen, US and Lund, MS 2014. Genomic predictions based on a joint reference population for the Nordic Red cattle breeds. Journal of Dairy Science 97, 4485–4496.Google Scholar

Figure 1 Parameters and interactions between parameters directly or indirectly influencing the accuracy of genomic prediction in dairy cattle.

Article contents

Review: Opportunities and challenges for small populations of dairy cattle in the era of genomics

Abstract

Keywords

Implications

Introduction

Relevance of linkage disequilibrium and relationship level

Combining populations from different countries or different breeds

Statistical approaches for multi-breed genomic prediction

Including female information into reference populations

Influence on genomic prediction accuracy and genetic gain

Selection of informative animals

Additional efforts when adding cows to the reference population

Imputation of un-genotyped animals

Additional aspects for small populations

The challenges of phenotyping

New traits: new challenges

Strategies for phenotyping

Health traits: well-structured recording required

Conclusion

Acknowledgement

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests