Skip to main content Accessibility help
  • Print publication year: 2016
  • Online publication date: June 2016

7 - Perspective: Biodiversity and the (data) beast

from Part II - Next Generation Biodiversity Science



The challenges faced in the analysis of high-throughput sequencing data are discussed so frequently that the issues have become palpable stereotypes. Phrases such as ‘data deluge’, ‘hockey stick graph’ and ‘bioinformatics bottleneck’ are ubiquitous to the point of spawning an internet bingo card of overused sound bytes for audiences to check off during seminars ( Yet for all the discussion of these challenges, the dialogue about potential solutions is ignored or wildly speculative. In the sequencing world, the game is changing and no one knows how to make the next play. Computational pipelines progress slowly compared to the pace of sequencing technology, with each new platform requiring updated iterations of code and new empirical tests of error rates and data formats.

In spite of the myriad challenges left to surmount, high-throughput sequencing has already transformed and accelerated the pace of biodiversity research. Our current bioinformatic capabilities have been hard-won: characterizing and grappling with fundamentally different sequencing chemistries and order-of-magnitude-increases in file size have required substantial initial investments. The infancy of high-throughput fields means that the current biological insights are rudimentary compared to the sophisticated, complex analyses that will become available over the next decade. Yet by simply investigating ecosystems from a new perspective (genome-scale and community-level exploration, versus the narrower genetic and taxonomic questions previously necessitated by lower throughput Sanger sequencing), we have instantly gained a transformative view of biodiversity and ecological processes. These fledgling insights are already unprecedented, and the steadily increasing breadth of computational tools continues to widen our capacity for integrative data analysis.

The birth and death of sequencing technologies

Researchers impact sequencing technology almost as much as sequencing technology drives research. The platform currently in vogue may quickly fall out of fashion when a better (and cheaper) option hits the market. Biomedical applications drive the market and design for sequencers, with many large-scale sequencing centres focusing their resources on clinical applications (BGI@UCDavis, the Broad Institute), or species of agricultural or economic importance (BGI's facilities in China). Although many ‘megasequencing’ projects focused on biodiversity are now underway (Table 7.1), more fundamental and blue-skies research questions are inherently at the mercy of the technology and protocols favoured across biomedical fields. The dominance of BGI and the falling cost of sequencing are also prompting a reshuffling of long-term visions for many core facilities.

Alkan, C., Sajjadian, S. and Eichler, E. E. (2011). Limitations of next-generation genome sequence assembly. Nature Methods, 8, 61–5.
Baker, M. (2010). Next-generation sequencing, adjusting to data overload. Nature Methods, 7, 495–9.
Berger, S. A. and Stamatakis, A. (2011). Aligning short reads to reference alignments and trees. Bioinformatics, 27, 2068–75.
Bik, H. M., Sung, W., De Ley, P., et al. (2011). Metagenetic community analysis of microbial eukaryotes illuminates biogeographic patterns in deep-sea and shallow water sediments. Molecular Ecology, doi, 10.1111/j.1365–1294X.2011.05297.x.
Caporaso, J. G., Kuczynski, J., Stombaugh, J., et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–6.
Caporaso, J. G., Lauber, C. L., Costello, E. K., et al. (2011). Moving pictures of the human microbiome. Genome Biology, 12, R50.
Contreras, J. L. (2011). Bermuda's legacy, policy, patents and the design of the genome commons. Minnesota Journal of Law, Science and Technology, 12, 61.
Denef, V. J., Kalnejais, L. H., Mueller, R. S., et al. (2010). Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 107, 2383–90.
Denef, V. J., VerBerkmoes, N. C., Shah, M. B., et al. (2009). Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. Environmental Microbiology, 11, 313–25.
Dick, G. J., Andersson, A. F., Baker, B. J., et al. (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biology, 10, R85.
Ding, F., Manosas, M., Spiering, M. M., et al. (2012). Single-molecule mechanical identification and sequencing. Nature Methods, 9, 367–72.
Fonseca, V. G., Carvalho, G. R., Sung, W., et al. (2010). Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature Communications, 1, 98.
Fox, P. and Hendler, J. (2011). Changing the equation on scientific data visualization. Science, 331, 705–8.
Freeland, J. R., Petersen, S. D. and Kirk, H., eds. (2011). Molecular Ecology, edn. Chichester, Wiley Blackwell.
Gienapp, P., Teplitsky, C., Alho, J. S., Mills, J. A. and Merila, J. (2008). Climate change and evolution, dientangling environmental and genetic responses. Molecular Ecology, 17, 167–78.
Gilbert, J., Bao, Y., Wang, H., et al. (2012). Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012. Standards in Genomic Sciences, 6, doi, 10.4056/sigs.2876184.
Gilbert, J. A., Field, D., Huang, Y., et al. (2008). Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One, 3, e3042.
Gilbert, J. A., Field, D., Swift, P., et al. (2009). The seasonal structure of microbial communities in the Western English Channel. Environmental Microbiology, 11, 3132–9.
Gilbert, J. A., Field, D., Swift, P., et al. (2010). The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS One, 5, e15545.
Glenn, T. C. (2011). Field guide to next-generation DNA sequencers. Molecular Ecology Resources, 11, 759–69.
Gnerre, S., MacCallum, I., Przybylski, D., et al. (2010). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America, 108, 1513–18.
Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A. C. and Baird, D. (2011). Environmental Barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One, 6, e17497.
Huse, S. M., Huber, J. A., Morrison, H. G., Sogin, M. L. and Welch, D. M. (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143.
Huse, S. M., Welch, D. M., Morrison, H. G. and Sogin, M. L. (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 12, 1889–98.
Huson, D. H., Mitra, S., Rusccheweyh, H.-J., Weber, N. and Schuster, S. C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Research, 21, 1552–60.
JGI (2011). Joint Genome Insitutue (JGI): A 10-Year Strategic Vision. Walnut Creek, CA, US Department of Energy.
King, B. L., Gillis, J. A., Carlisle, H. R. and Dahn, R. D. (2011). A natural deletion of the HoxC cluster in elasmobranch fishes. Science, 334, 1517.
Kryazhimskiy, S. and Plotkin, J. B. (2008). The population genetics of dN/dS. PLoS Genetics, 4, e1000304.
Larsen, P. E., Field, D. and Gilbert, J. A. (2012). Predicting bacterial community assemblages using an artificial neural network approach. Nature Methods, doi, 10.1038/nmeth.1975.
Linnarsson, S. (2012). Magnetic Sequencing. Nature Methods, 9(4), 339–40.
Lo, I., Denef, V. J., VerBerkmoes, N. C., et al. (2007). Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature, 446, 537–41.
Lukjancenko, O., Wassenaar, T. M. and Ussery, D. W. (2010). Comparison of 61 sequenced Escherichia coli genomes. Microbial Ecology, 60, 708–20.
Martinez-Garcia, M., Brazel, D., Poulton, N. J., et al. (2012a). Unveiling in situ interactions between marine protists and bacteria through single cell sequencing. The ISME Journal, 6, 703–7.
Martinez-Garcia, M., Swan, B. K., Poulton, N. J., et al. (2012b). High-throughput single-cell sequencing identifies photoheterotrophs and chemoautotrophs in freshwater bacterioplankton. The ISME Journal, 6, 113–23.
Matsen, F. A., Kodnere, R. B. and Armbrust, E. V. (2010). pplacer, linear time maximum-likelihood Bayesian phyogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538.
Moran, M. A. (2009). Metatranscriptomics: eavesdropping on complex microbial communities. Microbe, 4, 329–35.
Mueller, R. S., Denef, V. J., Kalnejais, L. H., et al. (2010). Ecological distribution and population physiology defined by proteomics in a natural microbial community. Molecular Systems Biology, 6, 347.
Porazinska, D. L., Giblin-Davis, R. M.Sung, W. and Thomas, W. K. (2010). Linking operational clustered taxonomic units (OCTUs) from parallel ultra sequencing (PUS) to nematode species. Zootaxa, 2427, 55–63.
Quince, C., Lanzen, A., Davenport, R. J. and Turnbaugh, P. J. (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics, 12, 38.
Selkoe, K. A., Watson, J. R., White, C., et al. (2010). Taking the chaos out of genetic patchiness: seascape genetics reveals ecological and oceanographic drivers of genetic patterns in three temperate reef species. Molecular Ecology, 19, 3708–26.
Snelgrove, P. V. R., Blackburn, T. H., Hutchings, P., et al. (1997). The importance of marine sediment biodiversity in ecosystem processes. Ambio, 26, 578–83.
Sogin, M. L., Morrison, H. G., Huber, J. A., et al. (2006). Microbial diversity in the deep sea and the unexplored “rare biosphere”. Proceedings of the National Academy of Sciences of the United States of America, 103, 12115–20.
Suzuki, Y. (2010). Statistical methods for detecting natural selection from genomic data. Genes and Genetic Systems, 85, 359–76.
Swan, B. K., Martinez-Garcia, M., Preston, C. M., et al. (2011). Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean. Science, 333, 1296–300.
Tautz, D., Ellegren, H. and Weigel, D. (2010). Next generation molecular ecology. Molecular Ecology, 19(Suppl 1), 1–3.
Taylor, H. R. and Harris, W. E. (2012). An emergent science on the brink of irrelevance: a review of the past 8 years of DNA barcoding. Molecular Ecology Resources, 12, 377–88.
Thrash, J. C., Temperton, B., Swan, , et al. (2014). Single-cell enabled comparative genomics of a deep ocean SAR11 bathytype. The ISME journal, doi, 10.1038/ismej.2013.243.
Wang, Q., Garrity, G. M., Tiedje, J. M. and Cole, J. R. (2007). Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–7.
Werner, J. J., Koren, O., Hugenholtz, P., et al. (2011). Impact of training sets on classification of high-throughput bacerial 16S rRNA gene surveys. The ISME Journal, doi, 10.1038/ismej.2011.1082.
Wu, D., Hugenholtz, P., Mavromatis, K., et al. (2009). A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature, 462, 1056–9.
Yang, Z. and Bielawski, J. P. (2000). Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution, 15, 496–503.
Yilmaz, P., Kottmann, R., Field, D., et al. (2011). Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 29, 415–20.
Yoon, H. S., Price, D. C., Stepanauskas, R., et al. (2011). Single-cell genomics reveals organismal interactions in uncultivated marine protists. Science, 332, 714–17.