Skip to main content Accessibility help
×
Home

Methods of plant breeding in the genome era

  • SHIZHONG XU (a1) and ZHIQIU HU (a1)

Summary

Methods of genomic value prediction are reviewed. The majority of the methods are related to mixed model methodology, either explicitly or implicitly, by treating systematic environmental effects as fixed and quantitative trait locus (QTL) effects as random. Six different methods are reviewed, including least squares (LS), ridge regression, Bayesian shrinkage, least absolute shrinkage and selection operator (Lasso), empirical Bayes and partial least squares (PLS). The LS and PLS methods are non-Bayesian because they do not require probability distributions for the data. The PLS method is introduced as a special dimension reduction scheme to handle high-density marker information. Theory and methods of cross-validation are described. The leave-one-out cross-validation approach is recommended for model validation. A working example is used to demonstrate the utility of genome selection (GS) in barley. The data set contained 150 double haploid lines and 495 DNA markers covering the entire barley genome, with an average marker interval of 2·23 cM. Eight quantitative traits were included in the analysis. GS using the empirical Bayesian method showed high predictability of the markers for all eight traits with a mean accuracy of prediction of 0·70. With traditional marker-assisted selection (MAS), the average accuracy of prediction was 0·59, giving an average gain of GS over MAS of 0·11. This study provided strong evidence that GS using marker information alone can be an efficient tool for plant breeding.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Methods of plant breeding in the genome era
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Methods of plant breeding in the genome era
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Methods of plant breeding in the genome era
      Available formats
      ×

Copyright

Corresponding author

*Corresponding author: Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA. e-mail: shxu@ucr.edu

References

Hide All
Abdi, H. (2003). Partial least squares (PLS) regression. In Encyclopedia of Social Sciences Research Methods (ed. Lewis-Beck, M., Bryman, A. & Liao, T. F.). Thousand Oaks: Sage.
Beavis, W. D. (1994). The power and deceit of QTL experiments: lessons from comparative QTL studies. In Proceedings of the Forty-Ninth Annual Corn & Sorghum Industry Research Conference, p. 250266.Washington, DC: American Seed Trade Association.
Boulesteix, A.-L. & Strimmer, K. (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 8, 3244.
Calvin, J. A. (1993). REML estimation in unbalanced multivariate variance components models using an EM algorithm. Biometrics 49, 691701.
Camus-Kulandaivelu, L., Veyrieras, J.-B., Madur, D., Combes, V., Fourmann, M., Barraud, S., Dubrevil, P., Gouesnand, B., Manicacci, D. & Charcosset, A. (2006). Maize adaptation to temperate climate: relationship between population structure and polymorphism in the Dwarf8 gene. Genetics 172, 24492463.
Casella, G. (1985). An introduction to empirical bayes data analysis. The American Statistician 39, 8387.
Chahal, G. S. & Gosal, S. S. (2002). Principles and Procedures of Plant breeding: Biotechnological and Conventional Approaches. Boca Raton, FL: CRC Press.
Che, X. & Xu, S. (2010). Significance test and genome selection in Bayesian shrinkage analysis. International Journal of Plant Genomics 2010, 11 pages, doi: 10.1155/2010/893206.
Churchill, G. A. & Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138, 963971.
de Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18, 251263.
Dijkstra, T. (1983). Some comments on maximum likelihood and partial least squares methods. Journal of Econometrics 22, 6790.
Draper, N. R. & Smith, H. (1998). Applied Regression Analysis, 3rd edn.New York: John Wiley and Sons.
Dudley, J. W. (1993). Molecular markers in plant improvement: manipulation of genes affecting quantitative traits. Crop Science 33, 660668.
Dudley, J. W. & Johnson, G. R. (2009). Epistatic models improve prediction of performance in corn. Crop Science 49, 763770.
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association 78, 316331.
Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. (2004). Least angle regression. Annals of Statistics 32, 407499.
Falconer, D. S. & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics, 4th edn.Harlow, Essex, UK: Addison Wesley Longman.
Fisher, R. A. (1918). The correlations between relatives on the supposition of Mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh 52, 399433.
Freeman, M. & Tukey, J. (1950). Transformations related to the angular and the square root. Annals of Mathematical Statistics 21, 607611.
Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72, 320338.
Hayes, P. M. & Jyambo, O. (1994). Summary of QTL effects in the Steptoe×Morex population. Barley Genetics Newsletter 23, 98–143.
Hayes, P. M., Liu, B. H., Knapp, S. J., Chen, F., Jones, B., Blake, T., Fronckowiak, J., Rasmusson, D., Sorrells, M., Ullrich, S. E., Wesenberg, D. & Kleinhofs, A. (1993). Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm. Theoretical and Applied Genetics 87, 392401.
Hazel, L. (1943). The genetic basis for constructing selection indexes. Genetics 28, 476490.
Henderson, C. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423447.
Henderson, C. R. (1950). Estimation of genetic parameters. Annals of Mathematical Statistics 21, 309310.
Hill, W. G., Goddard, M. E. & Visscher, P. M. (2008). Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics 4, e1000008.
Hoerl, A. & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 8086.
Hu, Z. & Xu, S. (2009). PROC QTL – A SAS procedure for mapping quantitative trait loci. International Journal of Plant Genomics 2009, 3 pages, doi:10.1155/2009/141234.
Jeffreys, H. (1939). Theory of Probability, 1st edn.Oxford: The Clarendon Press.
Kadarmideen, H., Janss, L. & Dekkers, J. (2000). Power of quantitative trait locus mapping for polygenic binary traits using generalized and regression interval mapping in multi-family half-sib designs. Genetics Research 76, 305317.
Lamkeya, K. R. & Lee, M. (1993). Focused plant improvement: towards responsible and sustainable agriculture. In Proceedings of the 10th Australian Plant Breeding Conference, Gold Coast, p. 1823.
Lande, R. & Thompson, R. (1990). Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743756.
Legates, D. R. & McCabe, G. J. Jr. ( 1999). Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resources Research 35, 233241.
Lindgren, F., Geladi, P. & Wold, S. (1993). The kernel algorithm for PLS. Journal of Chemometrics 7, 4559.
Lynch, M. & Ritland, K. (1999). Estimation of pairwise relatedness with molecular markers. Genetics 152, 17531766.
Lynch, M. & Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates, Inc.
McCulloch, C. E. & Neuhaus, J. M. (2005). Generalized Linear Mixed Models. Encyclopedia of Biostatistics. New York: John Wiley & Sons Ltd.
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 18191829.
Moose, S. P. & Mumm, R. H. (2008). Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiology 147, 969977.
Moser, G., Tier, B., Crump, R. E., Khatkar, M. S. & Raadsma, H. W. (2009). A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genetics Selection Evolution 41, 56.
Park, T. & Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association 103, 681686.
Patterson, H. D. & Thompson, R. (1971). Recovery of inter-block information when the block sizes are unequal. Biometrika 58, 545554.
Pritchard, J. K., Stephens, M. & Donnelly, P. (2000 a). Inference of population structure using multilocus genotype data. Genetics 155, 945959.
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. (2000 b). Association mapping in structured populations. American Journal of Human Genetics 67, 170181.
Queller, D. C. & Goodnight, K. F. (1989). Estimating relatedness using genetic markers. Evolution 43, 258275.
Rebai, A. (1997). Comparison of methods for regression interval mapping in QTL analysis with non-normal traits. Genetics Research 69, 6974.
Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science 6, 1532.
Shao, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association 88, 486494.
Smith, H. (1936). A discriminant function for plant selection. Annals of Eugenics 7, 240250.
Solberg, T., Sonesson, A., Woolliams, J. & Meuwissen, T. (2009). Reducing dimensionality for prediction of genome-wide breeding values. Genetics Selection Evolution 41, 29.
ter Braak, C. J. F., Boer, M. P. & Bink, M. C. A. M. (2005). Extending Xu's Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170, 14351438.
Thornsberry, J. M., Goodman, M. M., Doebley, J., Kresovich, S., Nielsen, D. & Buckler, E. S. (2001). Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28, 286289.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267288.
Visscher, P. M., Medland, S. E., Ferreira, M. A. R., Morley, K. I., Zhu, G., Cornes, B. K., Montgomery, G. W. & Martin, N. G. (2006). Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genetics 2, e41.
Wold, H. (1973). Nonlinear iterative partial least squares (NIPALS) modeling: some current developments. In Multivariate Analysis (ed. Krishnaiah, P. R.), New York: Academic Press.
Xie, C. & Xu, S. (1997). Restricted multistage selection indices. Genetics Selection Evolution 29, 193203.
Xu, S. (2003). Estimating polygenic effects using markers of the entire genome. Genetics 163, 789801.
Xu, S. (2007). An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63, 513–21.
Xu, S. (2010). An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity, doi: 10.1038/hdy.2009.180.
Xu, S. & Jia, Z. (2007). Genomewide analysis of epistatic effects for quantitative traits in barley. Genetics 175, 1955–63.
Xu, S. & Muir, W. (1992). Selection index updating. Theoretical and Applied Genetics 83, 451458.
Yandell, B. S., Mehta, T., Banerjee, S., Shriner, D., Venkataraman, R., Moon, J. Y., Neely, W. W., Wu, H., von Smith, R. & Yi, N. (2007). R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics 23, 641643.
Yang, R., Yi, N. & Xu, S. (2006). Box-Cox transformation for QTL mapping. Genetica 128, 133143.
Yi, N. (2010). Statistical analysis of genetic interactions. Genetics Research 92, 443459.
Yi, N. & Xu, S. (2002). Mapping quantitative trait loci with epistatic effects. Genetical Research 79, 185198.
Yi, N. & Xu, S. (2008). Bayesian Lasso for quantitative trait loci mapping. Genetics 179, 10451055.
Yi, N., Xu, S. & Allison, D. B. (2003). Bayesian model choice and search strategies for mapping interacting quantitative trait Loci. Genetics 165, 867883.
Yu, J., Pressoir, G., Briggs, W. H., Vroh Bi, I., Yamasaki, M., Doebley, J. F., McMullen, M. D., Gaut, B. S., Nielsen, D. M., Holland, J. B., Kresovich, S. & Buckler, E. S. (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38, 203208.
Zhang, Y. M. & Xu, S. (2005). A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity 95, 96–104.

Methods of plant breeding in the genome era

  • SHIZHONG XU (a1) and ZHIQIU HU (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed