Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
2 - Pairwise alignment
Published online by Cambridge University Press: 06 January 2010
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- References
- Index
Summary
The notion of sequence similarity is perhaps the most fundamental concept in biological sequence analysis. In the same way that the similarity of morphological traits served as evidence of genetic and functional relationships between species in classic genetics and biology, biological sequence similarity could frequently indicate structural and functional conservation among evolutionary related DNA and protein sequences. Introduction of the biologically relevant quantitative measure of sequence similarity, the similarity score, is not a trivial task. No simpler is the other task, developing algorithms that would find the alignment of two sequences with the best possible score given the scoring system. Finally, the third necessary component of the computational analysis of sequence similarity is the method of evaluation of statistical significance of an alignment. Such a method, establishing the cut-off values for the observed scores to be statistically significant, works properly as soon as the statistical distribution of similarity scores is determined analytically or computationally.
Chapter 2 of BSA includes twelve problems that require knowledge of the concepts and properties of the pairwise alignment algorithms. This topic is traditionally best known to biologists due to its utmost practical importance. Indeed, an initial characterization of any DNA or protein sequence starts with the BLAST analysis, utilization of a highly efficient heuristic pairwise alignment algorithm for searching for homologous sequences in a database.
Additional nine problems provide more information for understanding the protein evolution theory behind the log-odds scores of amino acid substitutions, as well as the models involved in the assessment of the statistical significance of the observed sequence similarity scores.
- Type
- Chapter
- Information
- Problems and Solutions in Biological Sequence Analysis , pp. 24 - 66Publisher: Cambridge University PressPrint publication year: 2006