Bayesian Model-Based Approaches for Solexa Sequencing Data

doi:10.1017/CBO9781139226448.007

6 - Bayesian Model-Based Approaches for Solexa Sequencing Data

Published online by Cambridge University Press: 05 June 2013

Riten Mitra ,

Peter Mueller and

Yuan Ji

Edited by

Kim-Anh Do ,

Zhaohui Steve Qin and

Marina Vannucci

Show author details

Riten Mitra: Affiliation:
University of Texas
Peter Mueller: Affiliation:
University of Texas
Yuan Ji: Affiliation:
NorthShore University Health-System
Kim-Anh Do: Affiliation:
University of Texas, MD Anderson Cancer Center
Zhaohui Steve Qin: Affiliation:
Emory University, Atlanta
Marina Vannucci: Affiliation:
Rice University, Houston

Book contents

Get access

Summary

Introduction

Recent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (Barski et al., 2007; Mikkelsen et al., 2007), transcriptome analysis (Friedländer et al., 2008; Hafner et al., 2008; Vera et al., 2008), and de novo genome assembly (Chaisson and Pevzner, 2008). In this chapter, we focus on the Illumina/Solexa sequencing platform. However, data from other technologies have similar characteristics, and we expect models similar to the one presented here to remain useful also for these technologies.

Solexa sequencing (www.illumina.com) produces millions of polymerase chain reaction (PCR) amplified and labeled sequences of short reads. For each short read, the measurements of their fluorescent intensities are stored in an I × 4 matrix, where I is the length of the read (e.g., I = 36). Such amatrix corresponds to a colony. The positions i = 1, …, I in the short read are sequenced in cycles by a biochemical procedure called sequencing-by-synthesis. As a result, each row of the colony matrix contains measurements from a cycle in the experiment in which the sequence of a single base is synthesized. At each cycle, all four nucleotides (A, C, G, and T) labeled with four different fluorescent dyes are probed, thus producing a quadruple vector of fluorescent intensity scores.

Type: Chapter
Information: Advances in Statistical Bioinformatics
Models and Integrative Inference for High-Throughput Data
, pp. 126 - 137

DOI: https://doi.org/10.1017/CBO9781139226448.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

6 - Bayesian Model-Based Approaches for Solexa Sequencing Data

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive