Book contents
- Frontmatter
- Contents
- Preface
- SECTION I INTRODUCTION AND BIOLOGICAL DATABASES
- SECTION II SEQUENCE ALIGNMENT
- 3 Pairwise Sequence Alignment
- 4 Database Similarity Searching
- 5 Multiple Sequence Alignment
- 6 Profiles and Hidden Markov Models
- 7 Protein Motifs and Domain Prediction
- SECTION III GENE AND PROMOTER PREDICTION
- SECTION IV MOLECULAR PHYLOGENETICS
- SECTION V STRUCTURAL BIOINFORMATICS
- SECTION V GENOMICS AND PROTEOMICS
- APPENDIX
- Index
- Plate section
- References
6 - Profiles and Hidden Markov Models
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- SECTION I INTRODUCTION AND BIOLOGICAL DATABASES
- SECTION II SEQUENCE ALIGNMENT
- 3 Pairwise Sequence Alignment
- 4 Database Similarity Searching
- 5 Multiple Sequence Alignment
- 6 Profiles and Hidden Markov Models
- 7 Protein Motifs and Domain Prediction
- SECTION III GENE AND PROMOTER PREDICTION
- SECTION IV MOLECULAR PHYLOGENETICS
- SECTION V STRUCTURAL BIOINFORMATICS
- SECTION V GENOMICS AND PROTEOMICS
- APPENDIX
- Index
- Plate section
- References
Summary
One of the applications of multiple sequence alignments in identifying related sequences in databases is by construction of position-specific scoring matrices (PSSMs), profiles, and hidden Markov models (HMMs). These are statistical models that reflect the frequency information of amino acid or nucleotide residues in a multiple alignment. Thus, they can be treated as consensus for a given sequence family. However, the “consensus” is not exactly a single sequence, but rather a model that captures not only the observed frequencies but also predicted frequencies of unobserved characters. The purpose of establishing the mathematical models is to allow partial matches with a query sequence so they can be used to detect more distant members of the same sequence family, resulting in an increased sensitivity of database searches. This chapter covers the basics of these statistical models followed by discussion of their applications.
POSITION-SPECIFIC SCORING MATRICES
A PSSM is defined as a table that contains probability information of amino acids or nucleotides at each position of an ungapped multiple sequence alignment. The matrix resembles the substitution matrices discussed in Chapter 3, but is more complex in that it contains positional information of the alignment. In such a table, the rows represent residue positions of a particular multiple alignment and the columns represent the names of residues or vice versa (Fig. 6.1). The values in the table represent log odds scores of the residues calculated from the multiple alignment.
- Type
- Chapter
- Information
- Essential Bioinformatics , pp. 75 - 84Publisher: Cambridge University PressPrint publication year: 2006