In the previous chapter, we saw how computational methods can be used to find pathogenicity islands in DNA. But now we want more! Specifically, we want to find actual genes in these pathogenicity islands so that we can analyze them and determine their function.
In this chapter, you’ll write a program that finds candidate genes in the genome. We say “candidate” because some things that look like genes turn out not to be. In the next and final chapter of this unit, you’ll take the last step to write a program that determines which of the candidate genes are likely to be real. You’ll then use your gene-finding program to identify the genes in a pathogenicity island of Salmonella typhi. Finally, you’ll use a web-based search tool called BLAST to compare these genes to known genes in the GenBank database. This final comparison will allow you to infer the function of some of the genes you’ve identified.
Open Reading Frames and the Central Dogma
Genes carry the instructions for proteins, which are the main molecules that “do things” in cells. To be able to find genes we must know something about how they operate. As you may recall, the process of constructing a protein proceeds according to the central dogma of molecular biology: The sequence of nucleotides in the DNA of a gene is transcribed into the sequence of nucleotides in a messenger RNA. This messenger RNA, in turn, is read off in units of three nucleotides, called codons. Each codon specifies a particular amino acid, and the sequence of codons in the messenger RNA determines a particular sequence of amino acids in the protein.