Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
5 - Substitution Matrices for Amino Acids
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
Summary
In Chapter 3 we considered the problem of aligning DNA sequences that had been read from the same source but with errors introduced by laboratory procedures. Rather arbitrarily, we assigned a reward of +1 for a match and penalties of -1 and -2 for mismatches and gaps. In this chapter, we will examine how to align protein sequences that differ as a result of evolution itself rather than owing to experimental error. The outcome will be a method for constructing substitution matrices for scoring alignments. Potentially, these matrices can assign a different reward or penalty for each of the 210 possible unordered pairs of amino acids that may appear in a column of an alignment.
The function of a protein is determined by its shape and charge distribution, not by the exact sequence of amino acids. During DNA replication, various mutations can alter the protein produced by a gene. Some types of mutations are:
point mutations, in which the machinery of replication randomly substitutes an incorrect nucleotide for the correct one;
indels, or insertions and deletions, in which extra bases are randomly inserted or bases are omitted;
translocations, in which longer pieces of DNA – possibly including one or more entire genes – are moved from one part of the chromosome to another part, or to another chromosome; and
duplications, in which long pieces of DNA are copied and integrated into a chromosome.
- Type
- Chapter
- Information
- Genomic PerlFrom Bioinformatics Basics to Working Code, pp. 55 - 71Publisher: Cambridge University PressPrint publication year: 2002