Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Prologue
- 2 A beginners’ guide
- 3 Python basics
- 4 Program control and logic
- 5 Functions
- 6 Files
- 7 Object orientation
- 8 Object data modelling
- 9 Mathematics
- 10 Coding tips
- 11 Biological sequences
- 12 Pairwise sequence alignments
- 13 Multiple-sequence alignments
- 14 Sequence variation and evolution
- 15 Macromolecular structures
- 16 Array data
- 17 High-throughput sequence analyses
- 18 Images
- 19 Signal processing
- 20 Databases
- 21 Probability
- 22 Statistics
- 23 Clustering and discrimination
- 24 Machine learning
- 25 Hard problems
- 26 Graphical interfaces
- 27 Improving speed
- Appendices
- Glossary
- Index
- Plate section
- References
12 - Pairwise sequence alignments
Published online by Cambridge University Press: 05 February 2015
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Prologue
- 2 A beginners’ guide
- 3 Python basics
- 4 Program control and logic
- 5 Functions
- 6 Files
- 7 Object orientation
- 8 Object data modelling
- 9 Mathematics
- 10 Coding tips
- 11 Biological sequences
- 12 Pairwise sequence alignments
- 13 Multiple-sequence alignments
- 14 Sequence variation and evolution
- 15 Macromolecular structures
- 16 Array data
- 17 High-throughput sequence analyses
- 18 Images
- 19 Signal processing
- 20 Databases
- 21 Probability
- 22 Statistics
- 23 Clustering and discrimination
- 24 Machine learning
- 25 Hard problems
- 26 Graphical interfaces
- 27 Improving speed
- Appendices
- Glossary
- Index
- Plate section
- References
Summary
Sequence alignment
The alignment of biological sequences is probably the most widely used operation in bioinformatics. In essence sequences are aligned so that we can determine how similar they are, and from this all sorts of useful information can come, such as whether two sequences are related by evolution (they have a common ancestor) or whether they have a similar biological function. The process of comparison is called alignment because the trickiest part of the process is to say which bits of two sequences are equivalent to one another; how residues of the different sequences can be paired up. Usually when we align sequences we seek to determine the best alignment out of the vast number of possible comparisons by finding the combination of residue pairs, one from each sequence, which gives the highest overall score for similarity.
Once a sequence alignment has been achieved, and assuming you trust the results, you can treat the aligned regions as having a degree of equivalency. If the alignment is good enough you might be able to say, for example, that two DNA sequences relate to the same kind of gene, despite the nucleotides not being exactly the same. It should always be remembered, however, that a sequence alignment can only give a limited amount of information about the underlying biology, but it is often an excellent starting point. Even where the knowledge gained is distinctly incomplete, a sequence alignment is quick to perform and often helpful to guide experiments. You might significantly narrow down the number of possibilities of what a section of DNA or protein could be, or say what it definitely is not, with one simple database search, i.e. doing alignments against a database of well-studied sequences. Sequence alignments are also done in a laboratory setting to guide procedures, for example to determine which part of a protein to investigate.
- Type
- Chapter
- Information
- Python Programming for BiologyBioinformatics and Beyond, pp. 208 - 231Publisher: Cambridge University PressPrint publication year: 2015