Statistical Analysis of Mapped Reads from mRNA-Seq Data

doi:10.1017/CBO9781139226448.005

4 - Statistical Analysis of Mapped Reads from mRNA-Seq Data

Published online by Cambridge University Press: 05 June 2013

Ernest Turro and

Alex Lewin

Edited by

Kim-Anh Do ,

Zhaohui Steve Qin and

Marina Vannucci

Show author details

Ernest Turro: Affiliation:
University of Cambridge
Alex Lewin: Affiliation:
Imperial College London
Kim-Anh Do: Affiliation:
University of Texas, MD Anderson Cancer Center
Zhaohui Steve Qin: Affiliation:
Emory University, Atlanta
Marina Vannucci: Affiliation:
Rice University, Houston

Book contents

Get access

Summary

Background

RNA Biology

A common and important aim in the field of genomics is the characterization of populations of RNA molecules. Investigators within the field typically wish to uncover the sequence and concentration of each RNA in a set of samples, either as an objective in its own right or as an early step in a larger analysis pipeline. Later steps might include the identification of differentially expressed genes between treatment and control groups, the clustering of genes into sets sharing putative, common regulatory pathways, or the association of genomic polymorphisms with patterns of expression.

Roughly 4% of the RNAs in a typical unprocessed RNA sample consists of messenger RNAs (mRNAs), which code for proteins. Non-coding RNAs largely comprise ribosomal and transfer RNAs, which are involved in protein synthesis but do not code for proteins themselves. The remainder of the non-coding RNAs include a set of less abundant types of molecules with diverse functions. As a result of their direct role in protein synthesis, mRNAs have been in the limelight of genomic research. Protein-coding genes are transcribed by RNA polymerase from their 5′ (upstream) end to their 3′ (downstream) end to produce pre-mRNA. As this process takes place, certain regions may be spliced out from the pre-mRNA and discarded, leaving behind only an mRNA sequence of connected exons, known as an isoform. Multiple combinations of exons may be produced, which is known as alternative splicing, and different isoforms may have different 5′ and 3′ transcript start and end sites. This allows a single gene to produce multiple distinct transcripts and contributes to the phenotypic complexity of eukaryotes. The collection of possible transcripts produced by a single gene is known as the gene model.

Type: Chapter
Information: Advances in Statistical Bioinformatics
Models and Integrative Inference for High-Throughput Data
, pp. 77 - 104

DOI: https://doi.org/10.1017/CBO9781139226448.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

4 - Statistical Analysis of Mapped Reads from mRNA-Seq Data

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive