Skip to main content Accessibility help
×
Home

Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications

  • M. P. L. Calus (a1), A. C. Bouwman (a1), J. M. Hickey (a2), R. F. Veerkamp (a1) and H. A. Mulder (a3)...

Abstract

In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.

Copyright

Corresponding author

E-mail: mario.calus@wur.nl

References

Hide All
Abecasis, GR, Cherny, SS, Cookson, WO and Cardon, LR 2002. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics 30, 97101.
Aulchenko, Y, Struchalin, M and van Duijn, C 2010. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11, 134.
Badke, YM, Bates, RO, Ernst, CW, Fix, J and Steibel, JP 2014. Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation. G3: Genes|Genomes|Genetics 4, 623631.
Badke, YM, Bates, RO, Ernst, CW, Schwab, C, Fix, J, Van Tassell, CP and Steibel, JP 2013. Methods of tagSNP selection and other variables affecting imputation accuracy in swine. BMC Genetics 14, 8.
Berry, DP and Kearney, JF 2011. Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal 5, 11621169.
Berry, DP, McClure, MC and Mullen, MP 2014. Within- and across-breed imputation of high-density genotypes in dairy and beef cattle from medium- and low-density genotypes. Journal of Animal Breeding and Genetics 131, 165172.
Bouwman, AC, Hickey, JM, Calus, MPL and Veerkamp, RF 2014. Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle. Genetics Selection Evolution 46, 6.
Brøndum, RF, Ma, P, Lund, MS and Su, G 2012. Short communication: genotype imputation within and across Nordic cattle breeds. Journal of Dairy Science 95, 67956800.
Browning, BL and Browning, SR 2009. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American Journal of Human Genetics 84, 210223.
Browning, SR and Browning, BL 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics 81, 10841097.
Calus, MPL, Veerkamp, RF and Mulder, HA 2011. Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework. Journal of Animal Science 89, 20422049.
Chen, J, Liu, Z, Reinhardt, F and Reents, R 2011. Reliability of genomic prediction using imputed genotypes for German Holsteins: Illumina 3K to 54K bovine chip. Interbull Bulletin 44, 5154.
Cleveland, MA and Hickey, JM 2013. Practical implementation of cost-effective genomic selection in commercial pig breeding using imputation. Journal of Animal Science 91, 35833592.
Dassonneville, R, Fritz, S, Ducrocq, V and Boichard, D 2012. Short communication: imputation performances of 3 low-density marker panels in beef and dairy cattle. Journal of Dairy Science 95, 41364140.
Druet, T and Georges, M 2010. A hidden Markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 184, 789798.
Druet, T and Farnir, FP 2011. Modeling of identity-by-descent processes along a chromosome between haplotypes and their genotyped ancestors. Genetics 188, 409419.
Druet, T, Schrooten, C and de Roos, APW 2010. Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. Journal of Dairy Science 93, 54435454.
Duarte, JLG, Bates, RO, Ernst, CW, Raney, NE, Cantet, RJ and Steibel, JP 2013. Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels. BMC Genetics 14, 113.
Erbe, M, Hayes, BJ, Matukumalli, LK, Goswami, S, Bowman, PJ, Reich, CM, Mason, BA and Goddard, ME 2012. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science 95, 41144129.
Fulton, JE 2012. Genomic selection for poultry breeding. Animal Frontiers 2, 3036.
Gengler, N, Mayeres, P and Szydlowski, M 2007. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal 1, 2128.
Goddard, M 2009. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245257.
Goddard, ME 2008. The use of high density genotyping in animal health. Developments in Biologicals 132, 383389.
Gredler, B, Seefried, FR, Schuler, U, Bapst, B, Schnyder, U and Hickey, JM 2011. Imputation in Swiss cattle breeds. Interbull Bulletin 44, 811.
Habier, D, Fernando, RL and Dekkers, JCM 2009. Genomic selection using low-density marker panels. Genetics 182, 343353.
Hayes, BJ, Bowman, PJ, Daetwyler, HD, Kijas, JW and van der Werf, JHJ 2012. Accuracy of genotype imputation in sheep breeds. Animal Genetics 43, 7280.
Hickey, JM and Gorjanc, G 2012. Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods. G3: Genes|Genomes|Genetics 2, 425427.
Hickey, JM and Kranis, A 2013. Extending long-range phasing and haplotype library imputation methods to impute genotypes on sex chromosomes. Genetics Selection Evolution 45, 10.
Hickey, JM, Crossa, J, Babu, R and de los Campos, G 2012. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Science 52, 654663.
Hickey, JM, Kinghorn, BP, Tier, B, Wilson, JF, Dunstan, N and van der Werf, JHJ 2011. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution 43, 12.
Howie, B, Fuchsberger, C, Stephens, M, Marchini, J and Abecasis, GR 2012. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics 44, 955959.
Howie, BN, Donnelly, P and Marchini, J 2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5, e1000529.
Hozé, C, Fouilloux, M-N, Venot, E, Guillaume, F, Dassonneville, R, Fritz, S, Ducrocq, V, Phocas, F, Boichard, D and Croiseau, P 2013. High-density marker imputation accuracy in sixteen French cattle breeds. Genetics Selection Evolution 45, 33.
Huang, Y, Hickey, J, Cleveland, M and Maltecca, C 2012a. Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genetics Selection Evolution 44, 25.
Huang, Y, Maltecca, C, Cassady, JP, Alexander, LJ, Snelling, WM and MacNeil, MD 2012b. Effects of reduced panel, reference origin, and genetic relationship on imputation of genotypes in Hereford cattle. Journal of Animal Science 90, 42034208.
Jiménez-Montero, JA, Gianola, D, Weigel, K, Alenda, R and González-Recio, O 2013. Assets of imputation to ultra-high density for productive and functional traits. Journal of Dairy Science 96, 60476058.
Johnston, J, Kistemaker, G and Sullivan, PG 2011. Comparison of different imputation methods. Interbull Bulletin 44, 2533.
Khatkar, M, Moser, G, Hayes, B and Raadsma, H 2012. Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle. BMC Genomics 13, 538.
Lin, P, Hartz, SM, Zhang, ZH, Saccone, SF, Wang, J, Tischfield, JA, Edenberg, HJ, Kramer, JR, Goate, AM, Bierut, LJ and Rice, JP 2010. A new statistic to evaluate imputation reliability. PLoS One 5, e9697.
Ma, P, Brøndum, RF, Zhang, Q, Lund, MS and Su, G 2013. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle. Journal of Dairy Science 96, 46664677.
Mulder, HA, Calus, MPL, Druet, T and Schrooten, C 2012. Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. Journal of Dairy Science 95, 876889.
Nicolazzi, EL, Biffani, S and Jansen, G 2013. Short communication: imputing genotypes using PedImpute fast algorithm combining pedigree and population information. Journal of Dairy Science 96, 26492653.
Pausch, H, Aigner, B, Emmerling, R, Edel, C, Gotz, K-U and Fries, R 2013. Imputation of high-density genotypes in the Fleckvieh cattle population. Genetics Selection Evolution 45, 3.
Pei, YF, Li, J, Zhang, L, Papasian, CJ and Deng, HW 2008. Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE 3, e3551.
Pimentel, E, Wensch-Dorendorf, M, Konig, S and Swalve, H 2013. Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture. Genetics Selection Evolution 45, 12.
Pryce, JE, Johnston, J, Hayes, BJ, Sahana, G, Weigel, KA, McParland, S, Spurlock, D, Krattenmacher, N, Spelman, RJ, Wall, E and Calus, MPL 2014. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations. Journal of Dairy Science 97, 17991811.
Sargolzaei, M, Chesnais, JP and Schenkel, FS 2011. FImpute – an efficient imputation algorithm for dairy cattle populations. Journal of Animal Science 89 (E-suppl. 1), 421.
Scheet, P and Stephens, M 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629644.
Segelke, D, Chen, J, Liu, Z, Reinhardt, F, Thaller, G and Reents, R 2012. Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. Journal of Dairy Science 95, 54035411.
Su, G, Brøndum, RF, Ma, P, Guldbrandtsen, B, Aamand, GP and Lund, MS 2012. Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red dairy cattle populations. Journal of Dairy Science 95, 46574665.
Sun, C, Wu, X-L, Weigel, KA, Rosa, GJM, Bauck, S, Woodward, BW, Schnabel, RD, Taylor, JF and Gianola, D 2012. An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genetics Research 94, 133150.
van Binsbergen, R, Bink, MCAM, Calus, MPL, van Eeuwijk, FA, Hayes, BJ, Hulsegge, I and Veerkamp, RF 2014. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle. Genetics Selection Evolution 46, 41.
VanRaden, PM 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 44144423.
VanRaden, PM, O’Connell, JR, Wiggans, GR and Weigel, KA 2011. Genomic evaluations with many more genotypes. Genetics Selection Evolution 43, 10.
VanRaden, PM, Null, DJ, Sargolzaei, M, Wiggans, GR, Tooker, ME, Cole, JB, Sonstegard, TS, Connor, EE, Winters, M, van Kaam, JBCHM, Valentini, A, Van Doormaal, BJ, Faust, MA and Doak, GA 2013. Genomic imputation and evaluation using high-density Holstein genotypes. Journal of Dairy Science 96, 668678.
Ventura, RV, Lu, D, Schenkel, FS, Wang, Z, Li, C and Miller, SP 2014. Impact of reference population on accuracy of imputation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. Journal of Animal Science 92, 14331444.
Wang, C, Habier, D, Peiris, BL, Wolc, A, Kranis, A, Watson, KA, Avendano, S, Garrick, DJ, Fernando, RL, Lamont, SJ and Dekkers, JCM 2013. Accuracy of genomic prediction using an evenly spaced, low-density single nucleotide polymorphism panel in broiler chickens. Poultry Science 92, 17121723.
Wang, H, Woodward, B, Bauck, S and Rekaya, R 2012. Imputation of missing SNP genotypes using low density panels. Livestock Science 146, 8083.
Weigel, KA, Van Tassell, CP, O’Connell, JR, VanRaden, PM and Wiggans, GR 2010a. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science 93, 22292238.
Weigel, KA, de los Campos, G, Vazquez, AI, Rosa, GJM, Gianola, D and Van Tassell, CP 2010b. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science 93, 54235435.
Wellmann, R, Preuß, S, Tholen, E, Heinkel, J, Wimmers, K and Bennewitz, J 2013. Genomic selection using low density marker panels with application to a sire line in pigs. Genetics Selection Evolution 45, 28.
Weng, Z, Zhang, Z, Zhang, Q, Fu, W, He, S and Ding, X 2013. Comparison of different imputation methods from low- to high-density panels using Chinese Holstein cattle. Animal 7, 729735.
Wiggans, GR, Cooper, TA, VanRaden, PM, Olson, KM and Tooker, ME 2012. Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. Journal of Dairy Science 95, 15521558.
Zhang, Z and Druet, T 2010. Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science 93, 54875494.

Keywords

Type Description Title
WORD
Supplementary Material

Calus supplementary material 1
Calus supplementary material 1

 Word (30 KB)
30 KB
WORD
Supplementary materials

Calus supplementary material 2
Calus supplementary material 2

 Word (30 KB)
30 KB
WORD
Supplementary materials

Calus supplementary material 3
Calus supplementary material 3

 Word (33 KB)
33 KB
WORD
Supplementary materials

Calus supplementary material 4
Calus supplementary material 4

 Word (30 KB)
30 KB
WORD
Supplementary materials

Calus supplementary material 5
Calus supplementary material 5

 Word (31 KB)
31 KB
WORD
Supplementary materials

Calus supplementary material 6
Calus supplementary material 6

 Word (62 KB)
62 KB
WORD
Supplementary materials

Calus supplementary material 7
Calus supplementary material 7

 Word (64 KB)
64 KB

Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications

  • M. P. L. Calus (a1), A. C. Bouwman (a1), J. M. Hickey (a2), R. F. Veerkamp (a1) and H. A. Mulder (a3)...

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed