High-throughput sequence analyses

Tim J. Stevens; Wayne Boucher

doi:10.1017/CBO9780511843556.018

17 - High-throughput sequence analyses

Published online by Cambridge University Press: 05 February 2015

Tim J. Stevens and

Wayne Boucher

Show author details

Tim J. Stevens: Affiliation:
MRC Laboratory of Molecular Biology, Cambridge
Wayne Boucher: Affiliation:
University of Cambridge

Book contents

Get access

Summary

High-throughput sequencing

Given the decreasing cost required to determine the sequence of nucleic acids, sequencing is used in increasingly wider contexts. Rather than only determining the genome sequence of an organism, high-throughput techniques allow researchers to investigate much more, such as the variation within individuals of a population, the amount of expression of individual genes in a given sample (e.g. by detecting RNAs) and the sequences which are bound to particular protein components. A sequencing run on one of the latest-generation sequencing machines may generate many gigabases (>109 bp) of data and so much of the task for bioinformatics is to make sense of the raw sequence data: to put it into a genomic, biological context. For organisms with a known genomic sequence the primary task when processing high-throughput sequence data is to simply map relatively short bits of sequence called ‘reads’ that come from the sequencing machine to a reference genome. Only then can the detected sequences be understood. By mapping newly acquired sequences on to the known chromosomes the whole database of information that annotates the genome, such as the position of genes and regulatory sequences, indicates which DNA features were detected. In this chapter we will give an introduction to various basic computational procedures involving high-throughput sequence data which can be achieved, or at least handled, using Python. Because this is a vast and rapidly expanding subject we can only lightly touch on the core concepts here, though hopefully we have provided solid starting points for further development.

Type: Chapter
Information: Python Programming for Biology
Bioinformatics and Beyond
, pp. 341 - 360

DOI: https://doi.org/10.1017/CBO9780511843556.018 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., and Rice, P.M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 38(6): 1767–1771CrossRef Google Scholar PubMed

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3): R25CrossRef Google Scholar PubMed

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25: 1754–1760CrossRef Google Scholar PubMed

Li, H., Handsaker, B., Wysoker, A., et al.; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16): 2078–2079CrossRef Google Scholar PubMed

Book contents

17 - High-throughput sequence analyses

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive