Book contents
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to Next-Generation Biological Platforms
- 2 An Introduction to The Cancer Genome Atlas
- 3 DNA Variant Calling in Targeted Sequencing Data
- 4 Statistical Analysis of Mapped Reads from mRNA-Seq Data
- 5 Model-Based Methods for Transcript Expression-Level Quantification in RNA-Seq
- 6 Bayesian Model-Based Approaches for Solexa Sequencing Data
- 7 Statistical Aspects of ChIP-Seq Analysis
- 8 Bayesian Modeling of ChIP-Seq Data from Transcription Factor to Nucleosome Positioning
- 9 Multivariate Linear Models for GWAS
- 10 Bayesian Model Averaging for Genetic Association Studies
- 11 Whole-Genome Multi-SNP-Phenotype Association Analysis
- 12 Methods for the Analysis of Copy Number Data in Cancer Research
- 13 Bayesian Models for Integrative Genomics
- 14 Bayesian Graphical Models for Integrating Multiplatform Genomics Data
- 15 Genetical Genomics Data: Some Statistical Problems and Solutions
- 16 A Bayesian Framework for Integrating Copy Number and Gene Expression Data
- 17 Application of Bayesian Sparse Factor Analysis Models in Bioinformatics
- 18 Predicting Cancer Subtypes Using Survival-Supervised Latent Dirichlet Allocation Models
- 19 Regularization Techniques for Highly Correlated Gene Expression Data with Unknown Group Structure
- 20 Optimized Cross-Study Analysis of Microarray-Based Predictors
- 21 Functional Enrichment Testing: A Survey of Statistical Methods
- 22 Discover Trend and Progression Underlying High-Dimensional Data
- 23 Bayesian Phylogenetics Adapts to Comprehensive Infectious Disease Sequence Data
- Index
- Plate section
4 - Statistical Analysis of Mapped Reads from mRNA-Seq Data
Published online by Cambridge University Press: 05 June 2013
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to Next-Generation Biological Platforms
- 2 An Introduction to The Cancer Genome Atlas
- 3 DNA Variant Calling in Targeted Sequencing Data
- 4 Statistical Analysis of Mapped Reads from mRNA-Seq Data
- 5 Model-Based Methods for Transcript Expression-Level Quantification in RNA-Seq
- 6 Bayesian Model-Based Approaches for Solexa Sequencing Data
- 7 Statistical Aspects of ChIP-Seq Analysis
- 8 Bayesian Modeling of ChIP-Seq Data from Transcription Factor to Nucleosome Positioning
- 9 Multivariate Linear Models for GWAS
- 10 Bayesian Model Averaging for Genetic Association Studies
- 11 Whole-Genome Multi-SNP-Phenotype Association Analysis
- 12 Methods for the Analysis of Copy Number Data in Cancer Research
- 13 Bayesian Models for Integrative Genomics
- 14 Bayesian Graphical Models for Integrating Multiplatform Genomics Data
- 15 Genetical Genomics Data: Some Statistical Problems and Solutions
- 16 A Bayesian Framework for Integrating Copy Number and Gene Expression Data
- 17 Application of Bayesian Sparse Factor Analysis Models in Bioinformatics
- 18 Predicting Cancer Subtypes Using Survival-Supervised Latent Dirichlet Allocation Models
- 19 Regularization Techniques for Highly Correlated Gene Expression Data with Unknown Group Structure
- 20 Optimized Cross-Study Analysis of Microarray-Based Predictors
- 21 Functional Enrichment Testing: A Survey of Statistical Methods
- 22 Discover Trend and Progression Underlying High-Dimensional Data
- 23 Bayesian Phylogenetics Adapts to Comprehensive Infectious Disease Sequence Data
- Index
- Plate section
Summary
Background
RNA Biology
A common and important aim in the field of genomics is the characterization of populations of RNA molecules. Investigators within the field typically wish to uncover the sequence and concentration of each RNA in a set of samples, either as an objective in its own right or as an early step in a larger analysis pipeline. Later steps might include the identification of differentially expressed genes between treatment and control groups, the clustering of genes into sets sharing putative, common regulatory pathways, or the association of genomic polymorphisms with patterns of expression.
Roughly 4% of the RNAs in a typical unprocessed RNA sample consists of messenger RNAs (mRNAs), which code for proteins. Non-coding RNAs largely comprise ribosomal and transfer RNAs, which are involved in protein synthesis but do not code for proteins themselves. The remainder of the non-coding RNAs include a set of less abundant types of molecules with diverse functions. As a result of their direct role in protein synthesis, mRNAs have been in the limelight of genomic research. Protein-coding genes are transcribed by RNA polymerase from their 5′ (upstream) end to their 3′ (downstream) end to produce pre-mRNA. As this process takes place, certain regions may be spliced out from the pre-mRNA and discarded, leaving behind only an mRNA sequence of connected exons, known as an isoform. Multiple combinations of exons may be produced, which is known as alternative splicing, and different isoforms may have different 5′ and 3′ transcript start and end sites. This allows a single gene to produce multiple distinct transcripts and contributes to the phenotypic complexity of eukaryotes. The collection of possible transcripts produced by a single gene is known as the gene model.
- Type
- Chapter
- Information
- Advances in Statistical BioinformaticsModels and Integrative Inference for High-Throughput Data, pp. 77 - 104Publisher: Cambridge University PressPrint publication year: 2013