Perspective: Biodiversity and the (data) beast

Holly M. Bik; W. Kelley Thomas

doi:10.1017/CBO9781139236355.008

7 - Perspective: Biodiversity and the (data) beast

from Part II - Next Generation Biodiversity Science

Published online by Cambridge University Press: 05 June 2016

Holly M. Bik and

Edited by

Joseph Hughes and

Holly M. Bik: Affiliation:
University of California Davis Genome Center, USA
W. Kelley Thomas: Affiliation:
University of New Hampshire, USA
Peter D. Olson: Affiliation:
Natural History Museum, London
Joseph Hughes: Affiliation:
University of Glasgow
James A. Cotton: Affiliation:
Wellcome Trust Sanger Institute, Cambridge

Book contents

Get access

Summary

Introduction

The challenges faced in the analysis of high-throughput sequencing data are discussed so frequently that the issues have become palpable stereotypes. Phrases such as ‘data deluge’, ‘hockey stick graph’ and ‘bioinformatics bottleneck’ are ubiquitous to the point of spawning an internet bingo card of overused sound bytes for audiences to check off during seminars (http://bit.ly/wYNxrF). Yet for all the discussion of these challenges, the dialogue about potential solutions is ignored or wildly speculative. In the sequencing world, the game is changing and no one knows how to make the next play. Computational pipelines progress slowly compared to the pace of sequencing technology, with each new platform requiring updated iterations of code and new empirical tests of error rates and data formats.

In spite of the myriad challenges left to surmount, high-throughput sequencing has already transformed and accelerated the pace of biodiversity research. Our current bioinformatic capabilities have been hard-won: characterizing and grappling with fundamentally different sequencing chemistries and order-of-magnitude-increases in file size have required substantial initial investments. The infancy of high-throughput fields means that the current biological insights are rudimentary compared to the sophisticated, complex analyses that will become available over the next decade. Yet by simply investigating ecosystems from a new perspective (genome-scale and community-level exploration, versus the narrower genetic and taxonomic questions previously necessitated by lower throughput Sanger sequencing), we have instantly gained a transformative view of biodiversity and ecological processes. These fledgling insights are already unprecedented, and the steadily increasing breadth of computational tools continues to widen our capacity for integrative data analysis.

The birth and death of sequencing technologies

Researchers impact sequencing technology almost as much as sequencing technology drives research. The platform currently in vogue may quickly fall out of fashion when a better (and cheaper) option hits the market. Biomedical applications drive the market and design for sequencers, with many large-scale sequencing centres focusing their resources on clinical applications (BGI@UCDavis, the Broad Institute), or species of agricultural or economic importance (BGI's facilities in China). Although many ‘megasequencing’ projects focused on biodiversity are now underway (Table 7.1), more fundamental and blue-skies research questions are inherently at the mercy of the technology and protocols favoured across biomedical fields. The dominance of BGI and the falling cost of sequencing are also prompting a reshuffling of long-term visions for many core facilities.

Type: Chapter
Information: Next Generation Systematics , pp. 154 - 174

DOI: https://doi.org/10.1017/CBO9781139236355.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alkan, C., Sajjadian, S. and Eichler, E. E. (2011). Limitations of next-generation genome sequence assembly. Nature Methods, 8, 61–5.CrossRef Google Scholar PubMed

Baker, M. (2010). Next-generation sequencing, adjusting to data overload. Nature Methods, 7, 495–9.CrossRef Google Scholar

Berger, S. A. and Stamatakis, A. (2011). Aligning short reads to reference alignments and trees. Bioinformatics, 27, 2068–75.CrossRef Google Scholar PubMed

Bik, H. M., Sung, W., De Ley, P., et al. (2011). Metagenetic community analysis of microbial eukaryotes illuminates biogeographic patterns in deep-sea and shallow water sediments. Molecular Ecology, doi, 10.1111/j.1365–1294X.2011.05297.x.Google Scholar PubMed

Caporaso, J. G., Kuczynski, J., Stombaugh, J., et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–6.CrossRef Google Scholar PubMed

Caporaso, J. G., Lauber, C. L., Costello, E. K., et al. (2011). Moving pictures of the human microbiome. Genome Biology, 12, R50.CrossRef Google Scholar PubMed

Contreras, J. L. (2011). Bermuda's legacy, policy, patents and the design of the genome commons. Minnesota Journal of Law, Science and Technology, 12, 61.Google Scholar

Denef, V. J., Kalnejais, L. H., Mueller, R. S., et al. (2010). Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 107, 2383–90.CrossRef Google Scholar PubMed

Denef, V. J., VerBerkmoes, N. C., Shah, M. B., et al. (2009). Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. Environmental Microbiology, 11, 313–25.CrossRef Google Scholar PubMed

Dick, G. J., Andersson, A. F., Baker, B. J., et al. (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biology, 10, R85.CrossRef Google Scholar PubMed

Ding, F., Manosas, M., Spiering, M. M., et al. (2012). Single-molecule mechanical identification and sequencing. Nature Methods, 9, 367–72.CrossRef Google Scholar PubMed

Fonseca, V. G., Carvalho, G. R., Sung, W., et al. (2010). Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature Communications, 1, 98.CrossRef Google Scholar PubMed

Fox, P. and Hendler, J. (2011). Changing the equation on scientific data visualization. Science, 331, 705–8.CrossRef Google Scholar PubMed

Freeland, J. R., Petersen, S. D. and Kirk, H., eds. (2011). Molecular Ecology, edn. Chichester, Wiley Blackwell.CrossRef Google Scholar

Gienapp, P., Teplitsky, C., Alho, J. S., Mills, J. A. and Merila, J. (2008). Climate change and evolution, dientangling environmental and genetic responses. Molecular Ecology, 17, 167–78.CrossRef Google Scholar PubMed

Gilbert, J., Bao, Y., Wang, H., et al. (2012). Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012. Standards in Genomic Sciences, 6, doi, 10.4056/sigs.2876184.CrossRef Google Scholar

Gilbert, J. A., Field, D., Huang, Y., et al. (2008). Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One, 3, e3042.CrossRef Google Scholar PubMed

Gilbert, J. A., Field, D., Swift, P., et al. (2009). The seasonal structure of microbial communities in the Western English Channel. Environmental Microbiology, 11, 3132–9.CrossRef Google Scholar PubMed

Gilbert, J. A., Field, D., Swift, P., et al. (2010). The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS One, 5, e15545.CrossRef Google Scholar

Glenn, T. C. (2011). Field guide to next-generation DNA sequencers. Molecular Ecology Resources, 11, 759–69.CrossRef Google Scholar PubMed

Gnerre, S., MacCallum, I., Przybylski, D., et al. (2010). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America, 108, 1513–18.Google Scholar PubMed

Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A. C. and Baird, D. (2011). Environmental Barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One, 6, e17497.CrossRef Google Scholar PubMed

Huse, S. M., Huber, J. A., Morrison, H. G., Sogin, M. L. and Welch, D. M. (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143.CrossRef Google Scholar PubMed

Huse, S. M., Welch, D. M., Morrison, H. G. and Sogin, M. L. (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 12, 1889–98.CrossRef Google Scholar PubMed

Huson, D. H., Mitra, S., Rusccheweyh, H.-J., Weber, N. and Schuster, S. C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Research, 21, 1552–60.CrossRef Google Scholar PubMed

JGI (2011). Joint Genome Insitutue (JGI): A 10-Year Strategic Vision. Walnut Creek, CA, US Department of Energy.

King, B. L., Gillis, J. A., Carlisle, H. R. and Dahn, R. D. (2011). A natural deletion of the HoxC cluster in elasmobranch fishes. Science, 334, 1517.CrossRef Google Scholar PubMed

Kryazhimskiy, S. and Plotkin, J. B. (2008). The population genetics of dN/dS. PLoS Genetics, 4, e1000304.CrossRef Google Scholar PubMed

Larsen, P. E., Field, D. and Gilbert, J. A. (2012). Predicting bacterial community assemblages using an artificial neural network approach. Nature Methods, doi, 10.1038/nmeth.1975.CrossRef

Linnarsson, S. (2012). Magnetic Sequencing. Nature Methods, 9(4), 339–40.CrossRef Google Scholar PubMed

Lo, I., Denef, V. J., VerBerkmoes, N. C., et al. (2007). Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature, 446, 537–41.CrossRef Google Scholar PubMed

Lukjancenko, O., Wassenaar, T. M. and Ussery, D. W. (2010). Comparison of 61 sequenced Escherichia coli genomes. Microbial Ecology, 60, 708–20.CrossRef Google Scholar PubMed

Martinez-Garcia, M., Brazel, D., Poulton, N. J., et al. (2012a). Unveiling in situ interactions between marine protists and bacteria through single cell sequencing. The ISME Journal, 6, 703–7.CrossRef Google Scholar PubMed

Martinez-Garcia, M., Swan, B. K., Poulton, N. J., et al. (2012b). High-throughput single-cell sequencing identifies photoheterotrophs and chemoautotrophs in freshwater bacterioplankton. The ISME Journal, 6, 113–23.CrossRef Google Scholar PubMed

Matsen, F. A., Kodnere, R. B. and Armbrust, E. V. (2010). pplacer, linear time maximum-likelihood Bayesian phyogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538.CrossRef Google Scholar

Moran, M. A. (2009). Metatranscriptomics: eavesdropping on complex microbial communities. Microbe, 4, 329–35.Google Scholar

Mueller, R. S., Denef, V. J., Kalnejais, L. H., et al. (2010). Ecological distribution and population physiology defined by proteomics in a natural microbial community. Molecular Systems Biology, 6, 347.CrossRef Google Scholar

Porazinska, D. L., Giblin-Davis, R. M.Sung, W. and Thomas, W. K. (2010). Linking operational clustered taxonomic units (OCTUs) from parallel ultra sequencing (PUS) to nematode species. Zootaxa, 2427, 55–63.CrossRef Google Scholar

Quince, C., Lanzen, A., Davenport, R. J. and Turnbaugh, P. J. (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics, 12, 38.CrossRef Google Scholar PubMed

Selkoe, K. A., Watson, J. R., White, C., et al. (2010). Taking the chaos out of genetic patchiness: seascape genetics reveals ecological and oceanographic drivers of genetic patterns in three temperate reef species. Molecular Ecology, 19, 3708–26.CrossRef Google Scholar PubMed

Snelgrove, P. V. R., Blackburn, T. H., Hutchings, P., et al. (1997). The importance of marine sediment biodiversity in ecosystem processes. Ambio, 26, 578–83.Google Scholar

Sogin, M. L., Morrison, H. G., Huber, J. A., et al. (2006). Microbial diversity in the deep sea and the unexplored “rare biosphere”. Proceedings of the National Academy of Sciences of the United States of America, 103, 12115–20.CrossRef Google Scholar

Suzuki, Y. (2010). Statistical methods for detecting natural selection from genomic data. Genes and Genetic Systems, 85, 359–76.CrossRef Google Scholar PubMed

Swan, B. K., Martinez-Garcia, M., Preston, C. M., et al. (2011). Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean. Science, 333, 1296–300.CrossRef Google Scholar PubMed

Tautz, D., Ellegren, H. and Weigel, D. (2010). Next generation molecular ecology. Molecular Ecology, 19(Suppl 1), 1–3.CrossRef Google Scholar PubMed

Taylor, H. R. and Harris, W. E. (2012). An emergent science on the brink of irrelevance: a review of the past 8 years of DNA barcoding. Molecular Ecology Resources, 12, 377–88.CrossRef Google Scholar

Thrash, J. C., Temperton, B., Swan, , et al. (2014). Single-cell enabled comparative genomics of a deep ocean SAR11 bathytype. The ISME journal, doi, 10.1038/ismej.2013.243.Google Scholar PubMed

Wang, Q., Garrity, G. M., Tiedje, J. M. and Cole, J. R. (2007). Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–7.CrossRef Google Scholar PubMed

Werner, J. J., Koren, O., Hugenholtz, P., et al. (2011). Impact of training sets on classification of high-throughput bacerial 16S rRNA gene surveys. The ISME Journal, doi, 10.1038/ismej.2011.1082.Google Scholar

Wu, D., Hugenholtz, P., Mavromatis, K., et al. (2009). A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature, 462, 1056–9.CrossRef Google Scholar

Yang, Z. and Bielawski, J. P. (2000). Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution, 15, 496–503.CrossRef Google Scholar PubMed

Yilmaz, P., Kottmann, R., Field, D., et al. (2011). Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 29, 415–20.CrossRef Google Scholar PubMed

Yoon, H. S., Price, D. C., Stepanauskas, R., et al. (2011). Single-cell genomics reveals organismal interactions in uncultivated marine protists. Science, 332, 714–17.CrossRef Google Scholar PubMed

Book contents

7 - Perspective: Biodiversity and the (data) beast

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive