Skip to main content Accessibility help
×
Home

An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle

  • CHUANYU SUN (a1), XIAO-LIN WU (a1) (a2), KENT A. WEIGEL (a1), GUILHERME J. M. ROSA (a2) (a3), STEWART BAUCK (a4), BRENT W. WOODWARD (a4), ROBERT D. SCHNABEL (a5), JEREMY F. TAYLOR (a5) and DANIEL GIANOLA (a1) (a2) (a3)...

Summary

Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority ‘voting’ to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle
      Available formats
      ×

Copyright

Corresponding author

*Corresponding author: 1675 Observatory Dr., Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA. Tel: +6082637824. E-mail: csun28@wisc.edu

References

Hide All
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
Boichard, D., Chung, H., Dassonneville, R., David, X., Eggen, A., Fritz, S., Gietzen, K. J., Hayes, B. J., Lawley, C. T., Sonstegard, T. S., Van Tassell, C. P., Vanraden, P. M., Viaud-Martinez, K. A., Wiggans, G. R. & for the Bovine LD Consortium. (2012). Design of a bovine low-density SNP array optimized for imputation. PLoS One 7, e34130.
Browning, B. L. & Browning, S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American Journal of Human Genetics 84, 210223.
Calus, M. P. L., Veerkamp, R. F. & Mulder, H. A. (2011). Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework. Journal of Animal Science 89, 20422049.
Daetwyler, H. D., Wiggans, G. R., Hayes, B. J., Woolliams, J. A. & Goddard, M. E. (2011). Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics 189, 317327.
Dasarathy, B. V. & Sheela, B. V. (1978). Composite classifier system design: concepts and methodology. Proceedings of the IEEE 67, 708713.
Dassonneville, R., Brøndum, R. F., Druet, T., Fritz, S., Guillaume, F., Guldbrandtsen, B., Lund, M. S., Ducrocq, V. & Su, G. (2011). Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. Journal of Dairy Science 94, 36793686.
Druet, T. & Georges, M. (2010). A hidden Markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and QTL fine mapping. Genetics 184, 789798.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148156.
Habier, D., Fernando, R. L. & Dekkers, J. C. M. (2009). Genomic selection using low-density marker panels. Genetics 182, 343353.
Hansen, L. K. & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 9931001.
Hayes, B. J., Bowman, P. J., Daetwyler, H. D., Kijas, J. W. & van der Werf, J. H. J. (2011). Accuracy of genotype imputation in sheep breeds. Animal Genetics 43, 7280.
Heslot, N., Yang, H. P., Sorrells, M. E. & Jannink, J. L. (2012). Genomic selection in plant breeding: a comparison of models. Crop Science 52, 146160.
Hickey, J. M., Kinghorn, B. P., Tier, B., Wilson, J. F., Dunstan, N. & van der Werf, J. H. J. (2011). A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genetics Selection Evolution 43, 12.
Howie, B. N., Donnelly, P. & Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5, e1000529.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J. & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation 3, 7987.
Johnston, J. & Kistemaker, G. (2011). Success rate of imputation using different imputation approaches. Available athttp://www.cdn.ca/Articles/GEBAPR2011/Success%20rate%20of%20imputation%20-%20Jarmila%20Johnston.pdf (accessed 7 March 2012).
Jordan, M. J. & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6, 181214.
Kong, A., Masson, G., Frigge, M. L., Gylfason, A., Zusmanovich, P., Thorleifsson, G., Olason, P. I., Ingason, A., Steinberg, S., Rafnar, T., Sulem, P., Mouy, M., Jonsson, F., Thorsteinsdottir, U., Gudbjartsson, D. F., Stefansson, H. & Stefansson, K. (2008). Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genetics 40, 10681075.
Li, Y., Willer, C., Sanna, S. & Abecasis, G. (2009). Genotype imputation. Annual Review of Genomics and Human Genetics 10, 387406.
Marchini, J. & Howie, B. (2010). Genotype imputation for genome-wide association studies. Nature Review Genetics 11, 499511.
Matukumalli, L. K., Lawley, C. T., Schnabel, R. D., Taylor, J. F., Allan, M. F., Heaton, M. P., O'Connell, J., Moore, S. S., Smith, T. P., Sonstegard, T. S. & Van Tassell, C. P. (2009). Development and characterization of a high density SNP genotyping assay for cattle. PLoS One 4, e5350.
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 18191829.
Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6, 2145.
Saatchi, M., McClure, M. C., McKay, S. D., Rolf, M. M., Kim, J. W., Decker, J. E., Taxis, T. M., Chapple, R. H., Ramey, H. R., Northcutt, S. L., Bauck, S., Woodward, B., Dekkers, J. C. M., Fernando, R. L., Schnabel, R. D., Garrick, D. J. & Taylor, J. F. (2011). Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation. Genetics Selection Evolution 43, 40.
Sargolzaei, M., Chesnais, J. P. & Schenkel, F. S. (2011). FImpute – An efficient imputation algorithm for dairy cattle populations. Journal of Animal Science. 89(E-Suppl. 1)/Journal of Dairy Science 94(E-Suppl. 1): 421 (abstr. 333).
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5, 197227.
Scheet, P. & Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629644.
Sewell, M. (2011). Ensemble learning. Available at http://www-typo3.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_02.pdf (accessed 7 March 2012).
VanRaden, P. M., O'Connell, J. R., Wiggans, G. R. & Weigel, K. A. (2011). Genomic evaluations with many more genotypes. Genetics Selection Evolution 43, 10.
Weigel, K. A., de los Campos, G., Gonzalez-Recio, O., Naya, H., Wu, X. L., Long, N., Rosa, G. J. M. & Gianola, D. (2009). Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. Journal of Dairy Science 92, 52485257.
Weigel, K. A., Van Tassell, C. P., O'Connell, J. R., VanRaden, P. M. & Wiggans, G. R. (2010). Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science 93, 22292238.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks 5, 241259.
Wu, X-L., Hayrettin, O., Duan, H., Bessinger, T., Bauck, S., Woodward, B., Rosa, G. J. M., Weigel, K. A., de Leon, N., Taylor, J. F. & Gianola, D. (2012). Parallel-BayesCpC on OSG: grid-enabled high-throughput computing for genomic selection in practice. International Plant & Animal Genome 2012 (http://www.intlpag.org/web/index.php/abstracts/poster-abstracts).
Zhang, Z. & Druet, T. (2010). Marker imputation with low-density marker panels in Dutch Holstein cattle. Journal of Dairy Science 93, 54875494.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed