In this chapter we give a minimalistic, combinatorial introduction to molecular biology, omitting the description of most biochemical processes and focusing on inputs and outputs, abstracted as mathematical objects. Interested readers might find it useful to complement our abstract exposition with a molecular biology textbook, to understand the chemical foundations of what we describe.
DNA, RNA, proteins
Life consists of fundamental units, called cells, that interact to form the complex, emergent behavior of a colony or of a multicellular organism. Different parts of a multicellular organism, like organs and tissues, consist of specialized cells that behave and interact in different ways. For example, muscle cells have a fundamentally different function and structure from brain cells. To understand such differences, one should look inside a cell.
A cell is essentially a metabolic network consisting of a mixture of molecules that interact by means of chemical reactions, with new “output” molecules constantly produced from “input” molecules. A set of reactions that are connected by their input and output products is called pathway. Each reaction is facilitated or hindered by a specific set of molecules, called enzymes, whose regulation affects the status of the entire network inside a cell. Enzymes are a specific class of proteins, linear chains of smaller building blocks (called amino acids) which can fold into complex three dimensional structures. Most proteins inside a human cell consist of amino acids taken from a set of 20 different types, each with peculiar structural and chemical properties.
The environment in which a cell is located provides input molecules to its metabolic network, which can be interpreted as signals to activate specific pathways. However, the cell can also regulate its own behavior using a set of instructions stored inside itself, which essentially specify how to assemble new proteins, and at what rate. In extreme simplification, this “program” is immutable, but it is executed dynamically, since the current concentration of proteins inside a cell affects which instructions are executed at the next time point. The conceptual network that specifies which proteins affect which instruction is called the regulation network.