7 - From biopolymers to formal language theory
Published online by Cambridge University Press: 02 December 2009
Summary
Introduction
Living systems on our planet rely on the construction of long molecules by linking relatively small units into sequences where each pair of adjoining units is connected in a uniform manner. The units of polypeptides (proteins) are a set of twenty amino acids. These units are connected by the carboxyl group (COOH) of one unit being joined through the amino group (NH2) of the next unit, with a water molecule being deleted in the process. The units of RNA are a set of four ribonucleotides. These units are connected by the phosphate group (PO4 attached at the 5′ carbon) of one unit being joined through replacement of the hydroxyl group (OH attached at the 3′ carbon) of the next unit, with a water molecule being deleted in the process. The units of single stranded DNA are a set of four deoxyribonucleotides with the joining process as in the case of RNA.
Molecules lie in three-dimensional space, whereas words lie on a line. One may adopt the convention of listing the amino acids of a protein on a line with the free amino group on the left and the free carboxyl group on the right. For both single stranded RNA and DNA molecules one may adopt the convention of listing their units on a line with the phosphate at the left and the free hydroxyl group at the right. These conventions allow us to model (without ambiguity) these biopolymers as words over finite alphabets: a twenty letter alphabet of symbols that denote the twenty amino acids and two four letter alphabets of symbols denoting the four units for RNA and DNA, respectively.
- Type
- Chapter
- Information
- Automata Theory with Modern Applications , pp. 231 - 244Publisher: Cambridge University PressPrint publication year: 2006