Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-2pzkn Total loading time: 0 Render date: 2024-05-25T18:18:16.019Z Has data issue: false hasContentIssue false

7 - Perspective: Biodiversity and the (data) beast

from Part II - Next Generation Biodiversity Science

Published online by Cambridge University Press:  05 June 2016

Holly M. Bik
Affiliation:
University of California Davis Genome Center, USA
W. Kelley Thomas
Affiliation:
University of New Hampshire, USA
Peter D. Olson
Affiliation:
Natural History Museum, London
Joseph Hughes
Affiliation:
University of Glasgow
James A. Cotton
Affiliation:
Wellcome Trust Sanger Institute, Cambridge
Get access

Summary

Introduction

The challenges faced in the analysis of high-throughput sequencing data are discussed so frequently that the issues have become palpable stereotypes. Phrases such as ‘data deluge’, ‘hockey stick graph’ and ‘bioinformatics bottleneck’ are ubiquitous to the point of spawning an internet bingo card of overused sound bytes for audiences to check off during seminars (http://bit.ly/wYNxrF). Yet for all the discussion of these challenges, the dialogue about potential solutions is ignored or wildly speculative. In the sequencing world, the game is changing and no one knows how to make the next play. Computational pipelines progress slowly compared to the pace of sequencing technology, with each new platform requiring updated iterations of code and new empirical tests of error rates and data formats.

In spite of the myriad challenges left to surmount, high-throughput sequencing has already transformed and accelerated the pace of biodiversity research. Our current bioinformatic capabilities have been hard-won: characterizing and grappling with fundamentally different sequencing chemistries and order-of-magnitude-increases in file size have required substantial initial investments. The infancy of high-throughput fields means that the current biological insights are rudimentary compared to the sophisticated, complex analyses that will become available over the next decade. Yet by simply investigating ecosystems from a new perspective (genome-scale and community-level exploration, versus the narrower genetic and taxonomic questions previously necessitated by lower throughput Sanger sequencing), we have instantly gained a transformative view of biodiversity and ecological processes. These fledgling insights are already unprecedented, and the steadily increasing breadth of computational tools continues to widen our capacity for integrative data analysis.

The birth and death of sequencing technologies

Researchers impact sequencing technology almost as much as sequencing technology drives research. The platform currently in vogue may quickly fall out of fashion when a better (and cheaper) option hits the market. Biomedical applications drive the market and design for sequencers, with many large-scale sequencing centres focusing their resources on clinical applications (BGI@UCDavis, the Broad Institute), or species of agricultural or economic importance (BGI's facilities in China). Although many ‘megasequencing’ projects focused on biodiversity are now underway (Table 7.1), more fundamental and blue-skies research questions are inherently at the mercy of the technology and protocols favoured across biomedical fields. The dominance of BGI and the falling cost of sequencing are also prompting a reshuffling of long-term visions for many core facilities.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alkan, C., Sajjadian, S. and Eichler, E. E. (2011). Limitations of next-generation genome sequence assembly. Nature Methods, 8, 61–5.CrossRefGoogle ScholarPubMed
Baker, M. (2010). Next-generation sequencing, adjusting to data overload. Nature Methods, 7, 495–9.CrossRefGoogle Scholar
Berger, S. A. and Stamatakis, A. (2011). Aligning short reads to reference alignments and trees. Bioinformatics, 27, 2068–75.CrossRefGoogle ScholarPubMed
Bik, H. M., Sung, W., De Ley, P., et al. (2011). Metagenetic community analysis of microbial eukaryotes illuminates biogeographic patterns in deep-sea and shallow water sediments. Molecular Ecology, doi, 10.1111/j.1365–1294X.2011.05297.x.Google ScholarPubMed
Caporaso, J. G., Kuczynski, J., Stombaugh, J., et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–6.CrossRefGoogle ScholarPubMed
Caporaso, J. G., Lauber, C. L., Costello, E. K., et al. (2011). Moving pictures of the human microbiome. Genome Biology, 12, R50.CrossRefGoogle ScholarPubMed
Contreras, J. L. (2011). Bermuda's legacy, policy, patents and the design of the genome commons. Minnesota Journal of Law, Science and Technology, 12, 61.Google Scholar
Denef, V. J., Kalnejais, L. H., Mueller, R. S., et al. (2010). Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 107, 2383–90.CrossRefGoogle ScholarPubMed
Denef, V. J., VerBerkmoes, N. C., Shah, M. B., et al. (2009). Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. Environmental Microbiology, 11, 313–25.CrossRefGoogle ScholarPubMed
Dick, G. J., Andersson, A. F., Baker, B. J., et al. (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biology, 10, R85.CrossRefGoogle ScholarPubMed
Ding, F., Manosas, M., Spiering, M. M., et al. (2012). Single-molecule mechanical identification and sequencing. Nature Methods, 9, 367–72.CrossRefGoogle ScholarPubMed
Fonseca, V. G., Carvalho, G. R., Sung, W., et al. (2010). Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature Communications, 1, 98.CrossRefGoogle ScholarPubMed
Fox, P. and Hendler, J. (2011). Changing the equation on scientific data visualization. Science, 331, 705–8.CrossRefGoogle ScholarPubMed
Freeland, J. R., Petersen, S. D. and Kirk, H., eds. (2011). Molecular Ecology, edn. Chichester, Wiley Blackwell.CrossRefGoogle Scholar
Gienapp, P., Teplitsky, C., Alho, J. S., Mills, J. A. and Merila, J. (2008). Climate change and evolution, dientangling environmental and genetic responses. Molecular Ecology, 17, 167–78.CrossRefGoogle ScholarPubMed
Gilbert, J., Bao, Y., Wang, H., et al. (2012). Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012. Standards in Genomic Sciences, 6, doi, 10.4056/sigs.2876184.CrossRefGoogle Scholar
Gilbert, J. A., Field, D., Huang, Y., et al. (2008). Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One, 3, e3042.CrossRefGoogle ScholarPubMed
Gilbert, J. A., Field, D., Swift, P., et al. (2009). The seasonal structure of microbial communities in the Western English Channel. Environmental Microbiology, 11, 3132–9.CrossRefGoogle ScholarPubMed
Gilbert, J. A., Field, D., Swift, P., et al. (2010). The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘multi-omic’ study of seasonal and diel temporal variation. PLoS One, 5, e15545.CrossRefGoogle Scholar
Glenn, T. C. (2011). Field guide to next-generation DNA sequencers. Molecular Ecology Resources, 11, 759–69.CrossRefGoogle ScholarPubMed
Gnerre, S., MacCallum, I., Przybylski, D., et al. (2010). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America, 108, 1513–18.Google ScholarPubMed
Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A. C. and Baird, D. (2011). Environmental Barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One, 6, e17497.CrossRefGoogle ScholarPubMed
Huse, S. M., Huber, J. A., Morrison, H. G., Sogin, M. L. and Welch, D. M. (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143.CrossRefGoogle ScholarPubMed
Huse, S. M., Welch, D. M., Morrison, H. G. and Sogin, M. L. (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology, 12, 1889–98.CrossRefGoogle ScholarPubMed
Huson, D. H., Mitra, S., Rusccheweyh, H.-J., Weber, N. and Schuster, S. C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Research, 21, 1552–60.CrossRefGoogle ScholarPubMed
JGI (2011). Joint Genome Insitutue (JGI): A 10-Year Strategic Vision. Walnut Creek, CA, US Department of Energy.
King, B. L., Gillis, J. A., Carlisle, H. R. and Dahn, R. D. (2011). A natural deletion of the HoxC cluster in elasmobranch fishes. Science, 334, 1517.CrossRefGoogle ScholarPubMed
Kryazhimskiy, S. and Plotkin, J. B. (2008). The population genetics of dN/dS. PLoS Genetics, 4, e1000304.CrossRefGoogle ScholarPubMed
Larsen, P. E., Field, D. and Gilbert, J. A. (2012). Predicting bacterial community assemblages using an artificial neural network approach. Nature Methods, doi, 10.1038/nmeth.1975.CrossRef
Linnarsson, S. (2012). Magnetic Sequencing. Nature Methods, 9(4), 339–40.CrossRefGoogle ScholarPubMed
Lo, I., Denef, V. J., VerBerkmoes, N. C., et al. (2007). Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature, 446, 537–41.CrossRefGoogle ScholarPubMed
Lukjancenko, O., Wassenaar, T. M. and Ussery, D. W. (2010). Comparison of 61 sequenced Escherichia coli genomes. Microbial Ecology, 60, 708–20.CrossRefGoogle ScholarPubMed
Martinez-Garcia, M., Brazel, D., Poulton, N. J., et al. (2012a). Unveiling in situ interactions between marine protists and bacteria through single cell sequencing. The ISME Journal, 6, 703–7.CrossRefGoogle ScholarPubMed
Martinez-Garcia, M., Swan, B. K., Poulton, N. J., et al. (2012b). High-throughput single-cell sequencing identifies photoheterotrophs and chemoautotrophs in freshwater bacterioplankton. The ISME Journal, 6, 113–23.CrossRefGoogle ScholarPubMed
Matsen, F. A., Kodnere, R. B. and Armbrust, E. V. (2010). pplacer, linear time maximum-likelihood Bayesian phyogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538.CrossRefGoogle Scholar
Moran, M. A. (2009). Metatranscriptomics: eavesdropping on complex microbial communities. Microbe, 4, 329–35.Google Scholar
Mueller, R. S., Denef, V. J., Kalnejais, L. H., et al. (2010). Ecological distribution and population physiology defined by proteomics in a natural microbial community. Molecular Systems Biology, 6, 347.CrossRefGoogle Scholar
Porazinska, D. L., Giblin-Davis, R. M.Sung, W. and Thomas, W. K. (2010). Linking operational clustered taxonomic units (OCTUs) from parallel ultra sequencing (PUS) to nematode species. Zootaxa, 2427, 55–63.CrossRefGoogle Scholar
Quince, C., Lanzen, A., Davenport, R. J. and Turnbaugh, P. J. (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics, 12, 38.CrossRefGoogle ScholarPubMed
Selkoe, K. A., Watson, J. R., White, C., et al. (2010). Taking the chaos out of genetic patchiness: seascape genetics reveals ecological and oceanographic drivers of genetic patterns in three temperate reef species. Molecular Ecology, 19, 3708–26.CrossRefGoogle ScholarPubMed
Snelgrove, P. V. R., Blackburn, T. H., Hutchings, P., et al. (1997). The importance of marine sediment biodiversity in ecosystem processes. Ambio, 26, 578–83.Google Scholar
Sogin, M. L., Morrison, H. G., Huber, J. A., et al. (2006). Microbial diversity in the deep sea and the unexplored “rare biosphere”. Proceedings of the National Academy of Sciences of the United States of America, 103, 12115–20.CrossRefGoogle Scholar
Suzuki, Y. (2010). Statistical methods for detecting natural selection from genomic data. Genes and Genetic Systems, 85, 359–76.CrossRefGoogle ScholarPubMed
Swan, B. K., Martinez-Garcia, M., Preston, C. M., et al. (2011). Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean. Science, 333, 1296–300.CrossRefGoogle ScholarPubMed
Tautz, D., Ellegren, H. and Weigel, D. (2010). Next generation molecular ecology. Molecular Ecology, 19(Suppl 1), 1–3.CrossRefGoogle ScholarPubMed
Taylor, H. R. and Harris, W. E. (2012). An emergent science on the brink of irrelevance: a review of the past 8 years of DNA barcoding. Molecular Ecology Resources, 12, 377–88.CrossRefGoogle Scholar
Thrash, J. C., Temperton, B., Swan, , et al. (2014). Single-cell enabled comparative genomics of a deep ocean SAR11 bathytype. The ISME journal, doi, 10.1038/ismej.2013.243.Google ScholarPubMed
Wang, Q., Garrity, G. M., Tiedje, J. M. and Cole, J. R. (2007). Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–7.CrossRefGoogle ScholarPubMed
Werner, J. J., Koren, O., Hugenholtz, P., et al. (2011). Impact of training sets on classification of high-throughput bacerial 16S rRNA gene surveys. The ISME Journal, doi, 10.1038/ismej.2011.1082.Google Scholar
Wu, D., Hugenholtz, P., Mavromatis, K., et al. (2009). A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature, 462, 1056–9.CrossRefGoogle Scholar
Yang, Z. and Bielawski, J. P. (2000). Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution, 15, 496–503.CrossRefGoogle ScholarPubMed
Yilmaz, P., Kottmann, R., Field, D., et al. (2011). Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 29, 415–20.CrossRefGoogle ScholarPubMed
Yoon, H. S., Price, D. C., Stepanauskas, R., et al. (2011). Single-cell genomics reveals organismal interactions in uncultivated marine protists. Science, 332, 714–17.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×