Skip to main content Accessibility help

Comparison of optimization methods for core subset selection from a large collection of Mexican wheat landraces characterized by SNP markers

  • Carlos L. Acuña-Matamoros (a1) and M. Humberto Reyes-Valdés (a1)


Core subset selection from collections hosted by seed banks, grow in importance as the number of accessions and genetic marker information rapidly increases. A data set of 20,526 single-nucleotide polymorphism (SNP) markers characterizing 7986 Mexican creole wheat landraces, was used to test 11 methods for core subset selection, through optimization criteria containing average genetic distance and genetic diversity. Allele richness was used as an additional criterion to qualify the generated core subsets. Three replications with random samples of 1500 SNP loci, each comprising a maximum of 3000 alleles, were used to perform the method evaluations through four different objective functions. The LR greedy search (LR) and LR with random first pair (LRSemi) were consistently best across all assays for maximizing the objective functions, and they performed well even for criteria not included in those functions. The Tukey's HSD (honest significant difference) multiple comparisons grouped those methods together with the sequential forward selection (SFS) and SFS with random first pair (SFSSemi) strategies as the top set of approaches. All of them are simple heuristic maximization algorithms, and outperformed two more sophisticated optimization approaches: parallel mixed replica exchange and replica exchange Monte Carlo. For their efficiency to optimize the objective functions and computing speed, the LRSemi and SFSSemi methods demonstrated to be good alternatives for core subset selection from large collections of highly homozygous accessions characterized by many biallelic markers.


Corresponding author

*Corresponding author. E-mail:


Hide All
De Beukelaer, HD, Smýkal, P, Davenport, GF and Fack, V (2012) Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search. BMC Bioinformatics 13: 312.
Franco, J, Crossa, J, Villaseñor, J, Taba, S and Eberhart, SA (1998) Classifying genetic resources by categorical and continuous variables. Crop Science 38: 16881696.
Franco, J, Crossa, J, Taba, S and Shands, H (2005) A sampling strategy for conserving genetic diversity when forming core subsets using genetic markers. Crop Science 46: 854864.
Frankel, OH and Brown, AHD (1984) Plant genetic resources today: a critical appraisal. In Holden, JHW and Williams, JT (eds) Crop Genetic Resources: Conservation and Evaluation. London: George Allen and Unwin, pp. 249257.
Geyer, CJ (1991) Markov chain Monte Carlo maximum likelihood. In Keramidas, (ed.) Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface. Interface Foundation: Fairfax Station, pp. 156163.
Goodman, MM and Stuber, CW (1983) Races of maize: vI. Isozyme variation among races of maize in Bolivia. Maydica 28: 169187.
Gouesnard, B, Bataillon, TM, Decoux, G, Rozale, C, Schoen, DJ and David, JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. The Journal of Heredity 92: 9394.
Govindaraj, M, Vetriventhan, M and Srinivasan, M (2015) Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives. Genetics Research International 2015: 14.
Iba, Y (2001) Extended ensemble monte carlo. International Journal of Modern Physics C 12: 623656.
Kim, KW, Chung, HK, Cho, GT, Ma, KH, Chandrabalan, D, Gwag, JG, Kim, TS, Cho, EG and Pak, YJ (2007) Powercore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics 23: 21552162.
Kimura, K and Taki, K (1991) Time-homogeneous parallel annealing algorithm. In Vichneetsky, R and Miller, JJH (eds.) Proceedings of the 13th IMACS World Congress on Computation and Applied Mathematics (IMACS'91), vol. 2. Dublin, Ireland: International Association for Mathematics and Computer Simulation, pp. 827828.
R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at (Accessed January 2016).
Reyes-Valdes, MH (2013) Informativeness of microsatellite markers. In: Kantartzi, SK (ed.) Microsatellites. Methods in molecular biology (Methods and Protocols), vol. 1006. Totowa NJ, USA: Humana Press, pp. 257270.
Schoen, DJ and Brown, AHD (1993) Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers. Proceedings of the National Academy of Sciences of the United States of America 90: 1062310627.
Shannon, CE (1948) A mathematical theory of communication. The Bell System Technical Journal 27: 623656.
Singh, S, Sansaloni, C, Petroli, C, Ellis, M and Kilian, A (2014) DArTseq-derived SNPs for wheat Mexican landrace accessions International Maize and Wheat Improvement Center (CIMMYT). Available at (Accessed September 2015).
Thachuk, C, Crossa, J, Franco, J, Dreisigacker, S, Warburton, M and Davenport, GF (2009) Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinformatics 10: 243.
Vikram, P, Franco, J, Burgueño-Ferreira, J, Li, H, Sehgal, D, Saint Pierre, C, Ortiz, C, Sneller, C, Tattaris, M, Guzman, C, Sansaloni, CP, Ellis, M, Fuentes-Davila, G, Reynolds, M, Sonder, K, Singh, P, Payne, T, Wenzl, P, Sharma, A, Bains, NS, Singh, GP, Crossa, J and Singh, S (2016) Unlocking the genetic diversity of Creole wheats. Scientific Reports 6: 23092.


Type Description Title
Supplementary materials

Acuña-Matamoros and Reyes-Valdés supplementary material
Tables S1-S2

 PDF (92 KB)
92 KB

Comparison of optimization methods for core subset selection from a large collection of Mexican wheat landraces characterized by SNP markers

  • Carlos L. Acuña-Matamoros (a1) and M. Humberto Reyes-Valdés (a1)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed