Book contents
- Frontmatter
- Contents
- Foreword by Steven Salzberg
- Preface
- Acknowledgements
- 1 Introduction
- 2 Mathematical preliminaries
- 3 Overview of computational gene prediction
- 4 Gene finder evaluation
- 5 A toy exon finder
- 6 Hidden Markov models
- 7 Signal and content sensors
- 8 Generalized hidden Markov models
- 9 Comparative gene finding
- 10 Machine-learning methods
- 11 Tips and tricks
- 12 Advanced topics
- Appendix
- References
- Index
Foreword by Steven Salzberg
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Foreword by Steven Salzberg
- Preface
- Acknowledgements
- 1 Introduction
- 2 Mathematical preliminaries
- 3 Overview of computational gene prediction
- 4 Gene finder evaluation
- 5 A toy exon finder
- 6 Hidden Markov models
- 7 Signal and content sensors
- 8 Generalized hidden Markov models
- 9 Comparative gene finding
- 10 Machine-learning methods
- 11 Tips and tricks
- 12 Advanced topics
- Appendix
- References
- Index
Summary
When Frederick Sanger sequenced the very first genome – the bacteriophage ϕ-X174 – in 1977, it was clear that DNA sequencing offered a dramatically faster way to find genes than earlier, traditional mapping methods. The small phage genome spans just 5386 bases, about 95% of which is used to encode 11 genes. For this and other viruses, gene finding is fast and easy: the proteins are encoded virtually end-to-end, sometimes even overlapping one another. The early days of DNA sequencing proceeded slowly but with great excitement, as the new technology was applied to small fragments of DNA from many different species. By the mid-1980s, scientists were attempting to automate the sequencing process, which soon led to larger sequencing projects, and in 1989 the Human Genome Project was launched, with the ambitious goal of sequencing the entire 3 billion base pairs of the 24 human chromosomes over the course of the next 15 years. In 1995 a team at The Institute for Genomic Research (TIGR) sequenced the first genome of a free-living organism, the bacterium Haemophilus influenzae, which at 1.8 million base pairs was considerably larger than any genome that had been sequenced before. The H. influenzae genome was the beginning of an enormous outpouring of DNA sequences, fueled by ever-lower sequencing costs and an ever-increasing thirst for new discoveries, that has now produced hundreds of genomes, both large and small.
- Type
- Chapter
- Information
- Methods for Computational Gene Prediction , pp. xi - xiiPublisher: Cambridge University PressPrint publication year: 2007