Skip to main content Accessibility help

Use of partial least squares regression to predict single nucleotide polymorphism marker genotypes when some animals are genotyped with a low-density panel

  • C. Dimauro (a1), R. Steri (a1), M. A. Pintus (a1), G. Gaspa (a1) and N. P. P. Macciotta (a1)...


High-density single nucleotide polymorphism (SNP) platforms are currently used in genomic selection (GS) programs to enhance the selection response. However, the genotyping of a large number of animals with high-throughput platforms is rather expensive and may represent a constraint for a large-scale implementation of GS. The use of low-density marker (LDM) platforms could overcome this problem, but different SNP chips may be required for each trait and/or breed. In this study, a strategy of imputation independent from trait and breed is proposed. A simulated population of 5865 individuals with a genome of 6000 SNP equally distributed on six chromosomes was considered. First, reference and prediction populations were generated by mimicking high- and low-density SNP platforms, respectively. Then, the partial least squares regression (PLSR) technique was applied to reconstruct the missing SNP in the low-density chip. The proportion of SNP correctly reconstructed by the PLSR method ranged from 0.78 to 0.97 when 90% and 50%, respectively, of genotypes were predicted. Moreover, data sets consisting of a mixture of actual and PLSR-predicted SNP or only actual SNP were used to predict genomic breeding values (GEBVs). Correlations between GEBV and true breeding values varied from 0.74 to 0.76, respectively. The results of the study indicate that the PLSR technique can be considered a reliable computational strategy for predicting SNP genotypes in an LDM platform with reasonable accuracy.


Corresponding author



Hide All
Abdi, H 2003. Partial least squares (PLS) regression. In Encyclopaedia of social sciences research methods (ed. M Lewis–Beck, A Bryman and T Futing), pp. 17. Sage Publication, Thousand Oaks, CA.
Browning, SR, Browning, BL 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics 81, 10841097.
De Jong, S 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18, 251263.
Draper, NR, Smith, H 1981. Applied regression analysis. John Wiley and Sons, New York.
Druet, T, Georges, M 2010. Hidden Markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 184, 789798.
Habier, D, Fernando, RL, Dekkers, JCM 2007. The impact of genetic relationship information on genome-assisted breeding values. Genetics 177, 23892397.
Habier, D, Fernando, RL, Dekkers, JCM 2009. Genomic selection using low-density marker panels. Genetics 182, 343353.
Hayes, BJ, Goddard, ME 2001. The distribution of the effects of genes affecting quantitative traits in livestock. Genetics Selection Evolution 33, 209229.
Hayes, BJ, Goddard, ME 2008. Technical note: prediction of breeding values using marker-derived relationship matrices. Journal of Animal Science 86, 20892092.
Hoeskuldsson, A 1988. Partial least squares PLS methods. Journal of Chemometrics 88, 211228.
Hubert, M, Branden, KV 2003. Robust methods for partial least squares regression. Journal of Chemometrics 17, 537549.
Lund, MS, Sahana, D, De Koning, DJ, Su, G, Carlborg, Ö 2009. Comparison of analyses of QTLMAS XII common dataset I: genomic selection. BMC proceedings 3 (suppl. 1), S1.
Macciotta, NPP, Dimauro, C, Bacciu, N, Fresi, P, Cappio-Borlino, A 2006. Use of a partial least-squares regression model to predict test day of milk, fat and protein yields in dairy goats. Animal Science 82, 463468.
Macciotta, NPP, Gaspa, G, Steri, R, Nicolazzi, E, Dimauro, C, Pieramati, C, Cappio-Borlino, A 2010. Use of principal component analysis to reduce the number of predictor variables in the estimation of genomic breeding values. Journal of Dairy Science 93, 27652774.
Meuwissen, THE, Hayes, BJ, Goddard, ME 2001. Prediction of total genetic values using genome-wide dense marker maps. Genetics 157, 18191829.
Solberg, TR, Sonesson, AK, Woolliams, J, Meuwissen, THE 2009. Reducing dimensionality for prediction of genome-wide breeding values. Genetics Selection Evolution 41, 2936.
VanRaden, PM, Van Tassell, CP, Wiggans, GR, Sonstengard, TS, Schnabel, RD, Taylor, JF, Schenkel, FS 2009. Reliability of genomic predictions for north American Holstein bulls. Journal of Dairy Science 92, 1624.
Weigel, KA, Van Tassell, CP, O'Connell, JR, VanRaden, PM, Wiggans, GR 2010. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science 93, 22292238.
Weigel, KA, De Los Campos, G, González-Recio, O, Naya, H, Wu, L, Long, N, Rosa, GJ, Gianola, D 2009. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. Journal of Dairy Science 92, 52485257.
Wold, S, Michael Sjöström, M, Eriksson, L 2001. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109130.


Related content

Powered by UNSILO

Use of partial least squares regression to predict single nucleotide polymorphism marker genotypes when some animals are genotyped with a low-density panel

  • C. Dimauro (a1), R. Steri (a1), M. A. Pintus (a1), G. Gaspa (a1) and N. P. P. Macciotta (a1)...


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.