A vast amount of modern genetic and human genetic research rests on the use of genetic markers. Obtaining markers has long been the geneticists' dream:
It would accordingly be desirable, in the case of man, to make an extensive and thorough-going search for as many factors as possible that could be used … as identifiers. They should, preferably, involve character differences that are (1) of common occurrence, (2) identifiable with certainty, (3) heritable in a simple Mendelian fashion. It seems reasonable to suppose that in a species so heterozygous there must really be innumerable such factors present. (Altenburg & Muller, 1920)
Until 1966, it was not known whether it would ever be possible to get the dreamed-of markers needed for medical genetics and many other studies, because it was not known whether individuals in a species were all genetically very similar (all being the “wild-type” genotype of textbooks) or whether, on the other hand, there is variation between normal individuals. The discovery of variation between normal individuals remained impossible until molecular markers, based on DNA sequence differences between normal individuals, became available. Although today's molecular markers are mostly DNA sequence differences, the first important step in developing molecular markers used differences in the charge on proteins, allowing variants of a given enzyme to be separated using electrophoresis. The discovery of abundant variation by this ‘allozyme’ approach, showing that alleles of genes encoding enzyme proteins could differ (Lewontin & Hubby, 1966; Harris, 1966), implied that these alleles' DNA sequences must vary, and this understanding led to the flowering of molecular marker technologies. Today, these markers are allowing geneticists to map human diseases, and to start to understand their genetic basis. The use of genetic variants was noted by Science (in 21 December 2007: pages 1842–1843) as the BREAKTHROUGH OF THE YEAR. Referrring to human genetic variation, they wrote:
Equipped with faster, cheaper technologies for sequencing DNA and assessing variation in genomes on scales ranging from one to millions of bases, researchers are finding out how truly different we are from one another.
Few biological results are still recognised as so important more than 50 years after they were first published.
The idea of a lab manual describing how to study sequence variants and use genetic markers is thus a good one, though one might suspect that there would be too much for one book, particularly if methods for analysing data are included. This book contains some, but not all, the useful information that one might hope for. The first section (Study Design) is rather a surprise, as it describes aspects of studying variation, exclusively in humans, without any introduction to tell readers what kinds of questions one might want data on human variation for, or why human variation is of paramount importance. It is simply assumed that human variation data are wanted, and the chapters plunge into short, but quite technical, accounts of a diverse set of topics largely focused on gene association studies to find and map human disease genes. These chapters are either statistical, or deal with web-based data sources, and how to use them (some of these are widely used, such as the dbSNP database of single nucleotide variants and the human HapMap web site, while others are more specialised). There are no accompanying chapters to explain the biology or population genetics of the types of variants, or the potential biases of the available information (such as the ascertainment bias for high frequency variants in the HapMap project). Unless readers already know about other aspects of variation, they may well also assume that genotyping known SNPs is the state of the art in human variation studies (when, in fact, the recently announced “resequencing” of 1,000 humans will probably soon supersede the HapMap as the best kind of data, because variants will be found even if they are at quite low frequencies among humans). On the other hand, the chapters are too brief to be used by biologists who know that background information, but want to learn how to used these web sites. Perhaps the book will be useful as one component of the training for people joining a lab where the other relevant expertise already exists.
Section 2 is completely different, first giving 17 step-by-step very detailed lab protocols for isolating DNA and RNA from various organisms and types of tissue, including fixed tissue (some of these use kits, and some are for specific organisms, so they are unlikely to have lasting or broad usefulness), then dealing with SNP genotyping and finally with analyses of copy number variation. These are often up-to-date and useful protocols. A problem with describing SNP and other marker genotyping, however, is that new methods are constantly being developed, so the list included is not complete, and will soon be outdated. For example, as far as I could tell (the index, unfortunately, lists acronyms only under their full names) CAPS markers, using restriction enzyme cutting of PCR products (Konieczny & Ausubel, 1993) and their useful derivatives (Neff et al., 1998) are not included, presumably because the variants are not necessarily ascertained from genome sequences.
Marker properties, such as codominance and recessivity, or patterns of location in genomes (Reamon-Buttner et al., 1999), are not discussed, presumably because only certain kinds of variants are considered, and not other marker types used by geneticists not working with humans (AFLPs are included very briefly in a later chapter). The basis for the choices included is not clear. SSCP methods (Hongyo et al., 1993) are not described, but a protocol for temperature gradient capillary electrophoresis is given. There is a protocol for genotyping using gene arrays in this section, but single feature polymorphisms are mentioned only in the context of particular organisms, Sequencing is mentioned as a genotyping method, but Pyrosequencing is mentioned only in passing, despite its growing importance in accurately estimating expression levels (the book does not really consider uses of markers for such purposes). Copy number variant detection is well covered in the context of the human genome, including differences between tissues, but nothing is included about how duplications and deletions can be studied in other organisms, including ones whose genomes have not been sequenced – yet useful information can be obtained from such species. Microsatellites are mentioned as “the variant of choice in linkage studies” without mentioning that this statement refers to humans, and that finding polymorphic microsatellite markers for other species is a major task; the possibility of “null alleles” is not mentioned.
Section 3 is entitled Data Analysis, but again the kinds of analyses are not described together with the biological questions for which they can be used. Chapter 19 (on SNP selection) does outline a clear use, to find variants associated with alleles affecting risk of diseases, but linkage disequilibrium (on which the approach depends) is reviewed later in the book, rather than in this section. Other chapters in this section are also clear and deal with important topics, including assessing statistical significance (of associations) and assessing evidence for the action of natural selection, but these chapters are brief. It seems unlikely that someone can learn how to test for selection without understanding how genetic variability can be quantified and compared between genome regions or populations, and without understanding how selection acting on a variant (say a mutation causing disease, or an advantageous mutation) affects the amount of diversity in and around the gene affected, and the frequencies of variants. The neutral theory of molecular evolution is not mentioned as the null hypothesis against which we can test for selection.
Diversity is various organisms is reviewed next, again with brief chapters for each of the 8 species covered (including three plant species). These mainly point to web sites and give little concrete information about the species. Their levels of variability are not quantified, and aspects of their biology that might affect these levels (e.g. population subdivision) are not mentioned. Yet these chapters do not provide complete bibliographies of variation in the species concerned.
Overall, the book is thus not really about variation, but mainly about detecting variants and genotyping them to use in association studies in humans, and perhaps some day in other species. To describe these applications, it might have been more useful for the book to review uses of variants in genomes, and evaluate their properties and uses for a wide range of purposes, and to accompany this with a web site for the detailed protocols. For an understanding of variation, much more of the biology should have been included. Genotyping variants is not at all the same as studying variation, and readers should be aware that this book introduces only parts of what they will need to understand. In modern biology, technical expertise, and expertise with the best and latest lab and computer methods, are, of course essential, but so is expertise in methods needed for analyzing the data collected, which should go beyond the technical aspects of proper statistical approaches, and should truly integrate important biological questions with data collection. This book could be useful for a university or lab library, but it is really most likely to be useful as a reference work for people who already know a lot. It could not be used effectively without an understanding of variation and how it behaves, and it is a pity that this aspect is left out of the book. It runs the risk that readers will think that getting the web site working (or a lab method) means that they can start collecting data, without the hard work of deciding the question to be studied, and how to analyse the data to be collected.