For most organisms, the manner in which individuals cope with temperature extremes represents an ecologically and evolutionarily important phenotype that can influence many aspects of fitness, local adaptation and the range of habitats in which populations can persist (Leather et al., Reference Leather, Walters and Bale1993; Clarke, Reference Clarke1996; Zhen & Ungerer, Reference Zhen and Ungerer2008). Temperature extremes represent a major abiotic stressor that all organisms must be able to manage, either through adaptive genetic responses to environmental variation, plastic responses to the environment, or some combination of both genetic and plastic mechanisms (Hoffmann & Parsons, Reference Hoffmann and Parsons1991; Ayrinhac et al., Reference Ayrinhac, Debat, Gibert, Kister, Legout, Moreteau, Vergillno and David2004; Hoffmann et al., Reference Hoffmann, Shirriffs and Scott2005; Hoffmann & Willi, Reference Hoffmann and Willi2008). This is especially true for organisms that are isothermal with their environments (e.g. insects and plants). For such species, the presence of segregating genetic variation for thermal tolerance facilitates survival in a changing environment or expansion outside the historical habitat (Hoffmann et al., Reference Hoffmann, Sorensen and Loeschcke2003).
Drosophila melanogaster has had great success adapting to novel environments during its expansion to its current worldwide species distribution. D. melanogaster originated in central tropical Africa, expanded and colonized more temperate climates in Europe and Asia, and more recently has colonized the rest of the world (David & Capy, Reference David and Capy1988). This movement out of Africa into more temperate regions of the world required D. melanogaster to cope with many novel thermal stresses. The manner in which populations respond to thermal extremes consists of responses to both high- and low-temperature extremes. Although both ends of the thermal spectrum are ecologically important, resistance to low temperatures appears to be a major climatic adaptation among Drosophila species (Gibert & Huey, Reference Gibert and Huey2001; Gibert et al., Reference Gibert, Moreteau, Petavy, Karan and David2001; David et al., Reference David, Gibert, Moreteau, Gilchrist and Huey2003) and populations (Gibert & Huey, Reference Gibert and Huey2001; Gibert et al., Reference Gibert, Moreteau, Petavy, Karan and David2001; Hoffmann et al., Reference Hoffmann, Anderson and Hallas2002; Ayrinhac et al., Reference Ayrinhac, Debat, Gibert, Kister, Legout, Moreteau, Vergillno and David2004; Schmidt & Paaby, Reference Schmidt and Paaby2008) in nature.
Many studies have shown that cold tolerance phenotypes represent an important adaptation in Drosophila; however, a comprehensive understanding of this adaptation requires a complete description of the genetic architecture influencing variation in the trait, as well as the identification of the functional allelic variants underlying such phenotypes. Our knowledge of the genetic architecture of cold tolerance has included studies focused on artificial selection (Tucic, Reference Tucic1979; Chen & Walker, Reference Chen and Walker1993; Anderson et al., Reference Anderson, Hoffmann and McKechnie2005); quantifying genetic variation along cold tolerance clines (Gibert & Huey, Reference Gibert and Huey2001; Hoffmann et al., Reference Hoffmann, Anderson and Hallas2002; Ayrinhac et al., Reference Ayrinhac, Debat, Gibert, Kister, Legout, Moreteau, Vergillno and David2004; Kimura, Reference Kimura2004; Collinge et al., Reference Collinge, Hoffmann and McKechnie2006); associations of variation in cold tolerance with polymorphisms in candidate genes (Anderson et al., Reference Anderson, Collinge, Hoffmann, Kellett and McKechnie2003; Collinge et al., Reference Collinge, Anderson, Weeks, Johnson and McKechnie2008); quantum trait locus (QTL) analyses of cold tolerance phenotypes in recombinant inbred lines (RILs) (Morgan & Mackay, Reference Morgan and Mackay2006; Norry et al., Reference Norry, Gomez and Loeschcke2007, Reference Norry, Scannapieco, Sambucetti, Bertoli and Loeschcke2008); identification of cold tolerant mutations (Takeuchi et al., Reference Takeuchi, Nakano, Kato, Kaneda, Aizu, Awano, Yonemura, Kiyonaka, Mori, Yamamoto and Umeda2009) and genetic/genomic changes in gene expression in response to artificial selection on cold tolerance (Telonis-Scott et al., Reference Telonis-Scott, Hallas, McKechnie, Wee and Hoffmann2009) or various cold stressors (Goto, Reference Goto2000, Reference Goto2001; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007). These studies have clearly demonstrated that substantial heritable genetic variation exists for cold tolerance and that this variation is influenced by a large number of genes. Combining this data across studies exposes a number of promising candidate gene regions within the genome between cytological positions 73A-90B6. This region has been identified by QTL analyses of two different sets of RILs (Morgan & Mackay, Reference Morgan and Mackay2006; Norry et al., Reference Norry, Scannapieco, Sambucetti, Bertoli and Loeschcke2008). Morgan & Mackay (Reference Morgan and Mackay2006) localized a QTL between 76B and 87B, while Norry et al. (Reference Norry, Scannapieco, Sambucetti, Bertoli and Loeschcke2008) identified a broader overlapping QTL between 73A and 90B6. Within the region between 73A and 90B6 many cold tolerance candidate genes exist including desaturase 2 (87B; Greenberg et al., Reference Greenberg, Moran, Coyne and Wu2003; Overgaard et al., Reference Overgaard, Sorensen, Petersen, Loeschcke and Holmstrup2005; Coyne & Elwyn, Reference Coyne and Elwyn2006) as well as multiple cold responsive candidate genes including Frost (85E; Goto, Reference Goto2001; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007) and senescence marker protein-30 (Smp-30) (88D; (Goto, Reference Goto2000; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007). It should be noted that although this broad QTL interval includes the left edge of the inversion, In(3R)Payne (Kennington et al., Reference Kennington, Partridge and Hoffmann2006), all of the cold tolerance candidate loci discussed above are outside the inversion.
Smp-30 is a particularly compelling candidate gene for cold tolerance as it has been shown to be highly responsive to cold stress (Goto, Reference Goto2000; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007). Goto (Reference Goto2000) found Smp-30 was up-regulated after 1–2 days of acclimation to 15°C, while both Qin et al. (Reference Qin, Neal, Robertson, Westwood and Walker2005) and Sinclair et al. (Reference Sinclair, Gibbs and Roberts2007) found Smp-30 was down-regulated after a shorter 2-h treatment at 0°C. Smp-30 contains three exons that encode a 33·3 kDa protein (Goto, Reference Goto2000). Additionally, although Smp-30 is not well studied in D. melanogaster, there are multiple biochemical studies in various mammalian cell lines suggesting that it has a role in Ca2+ ion homeostasis and signalling (Fujita et al., Reference Fujita, Inoue, Kitamura, Sato, Shimosawa and Maruyama1998; Inoue et al., Reference Inoue, Fujita, Kitamura, Shimosawa, Nagasawa, Inoue, Maruyama and Nagasawa1999; Son et al., Reference Son, Kim, Kim, Kim, Chung and Lee2008). This is physiologically significant because a shift in ion homeostasis (e.g. Ca2+) can lead to a change in the electrochemical potential across cell and organelle membranes, which lead to cold-induced immobilization or injury via several mechanisms (Hochachka, Reference Hochachka1986; Pullin & Bale, Reference Pullin and Bale1988; Kelty et al., Reference Kelty, Killian and Lee1996; Kostal et al., Reference Kostal, Vambera and Bastl2004, Reference Kostal, Renault, Mehrabianova and Bastl2007; Takeuchi et al., Reference Takeuchi, Nakano, Kato, Kaneda, Aizu, Awano, Yonemura, Kiyonaka, Mori, Yamamoto and Umeda2009). Finally, other physiological studies have found that calcium is required for rapid cold-hardening protection in the Antarctic midge (Teets et al., Reference Teets, Elnitsky, Benoit, Lopez-Martinez, Denlinger and Lee2008), as chilling causes changes in membrane properties that increase calcium flux, which, in turn, activates calcium-dependent protein kinases and transcription factors that regulate cold-specific proteins and genes.
Given the genetic data for Smp-30 and the known physiological importance of Ca2+ and ion homeostasis and signalling in cold-stress phenotypes in insects, we sought to determine if natural molecular variation in Smp-30 is associated with natural phenotypic variation in cold tolerance. To do this, we used an association mapping approach, which surveys sequence and phenotypic variation within a natural population of flies and statistically tests, if polymorphisms within the candidate gene explain the natural phenotypic variation in the trait. In this study, we are able to link molecular variation in the 2000 base pair coding and non-coding sequence of Smp-30 with natural phenotypic variation in cold tolerance in D. melanogaster.
2. Materials and methods
(i) Drosophila stocks
The lines used in this experiment were inbred lines created from a sample of females collected at the Raleigh, NC Farmer's Market in the spring of 2002. Wild-caught females were used to initiate isofemale lines, which were subsequently inbred (via full-sib mating) for 20 generations, resulting in a panel of 350 fully inbred yet naturally derived lines. This panel of lines has preserved the natural variation among lines, while eliminating variation within lines. In the current study, 257 randomly selected lines were used. Flies were reared on a standard cornmeal–agar–molasses medium at 25°C and 70% humidity on a 12:12 light:dark cycle.
(ii) Phenotypic assays
Cold tolerance was quantified by measuring the chill-coma recovery time developed by Morgan & Mackay (Reference Morgan and Mackay2006) . Four replicates of 25 flies per line per sex were measured in a randomized design. Five-to-seven-day-old same-sex flies were subjected to 0°C for 3 h. Upon removal from the cold, flies were returned to room temperature (23°C) and the percent recovery from chill-coma was recorded after 22 min at 23°C. This experimentally determined time point was chosen because this was the average 50% recovery time point in a pilot study of 13 randomly chosen lines from this panel (Supplemental Figure 1). Cold tolerance was measured on each line during the summer of 2006.
(iii) Quantitative genetic analyses
Analyses of variance (ANOVA) were used to partition variance in cold tolerance among the inbred lines using the following model, y=μ+L+S+L×S+∊, where y is the percent recovery from chill coma, μ is the overall mean, L is the random effect of line, S is the fixed effect of sex and ∊ is the residual variance. We estimated the total genotypic variance (σG2) among lines as σG2=σL2+σLS2, where σL2 is the among line variance component and σLS2 is the variance caused by the interaction of line and sex. The total phenotypic variance was estimated as σP2=σG2+σE2, where σE2 is the environmental variance. The broad sense heritability of cold tolerance was estimated as H 2=σG2/σP2. To compare the amount of genetic and environmental variance, the coefficient of variance was estimated. The coefficient of genetic variance was calculated as , where is the average percent of chill-coma recovery and σG2 is the genetic variation, while the coefficient of environmental variation was calculated as , where is the average percent of chill-coma recovery and σE2 is the environmental variance. All analyses were carried out using SAS in PROC GLM and VARCOMP procedures (SAS Institute).
(iv) Sequencing of Smp-30
DNA was extracted from single flies using the Puregene DNA isolation kit (Gentra Systems) in the summer of 2007. We sequenced approximately 2000 base pairs of the Smp-30 gene region, including 750 base pairs upstream of the translation start site and 100 base pairs downstream of the end of the coding sequence for 125 lines (GenBank accession nos. HM034321–HM034445). This was done using three pairs of PCR primers that were designed to amplify three roughly 600 bp amplicons spanning the Smp-30 gene region. Primers were designed using the published D. melanogaster sequence as a reference and the Primer3 program (http://biotools.umassmed.edu/bioapps/primer3_www.cgi). The three amplicons were amplified per line using 30 μl PCR reactions, which were subsequently sequenced in both directions by the AGTC at the University of Kentucky. Sequences were edited and contigs assembled in Seqman Pro (DNA* Lasergene suite). Contigs were then aligned using MUSCLE (http://www.ebi.ac.uk/Tools/muscle/index.html) and the polymorphisms (both single nucleotide polymorphisms (SNPs) and insertion/deletions) were identified. For the remaining 132 lines, we genotyped the majority of these polymorphisms in each line. This genotyping was done via a combination of partial direct sequencing, SNP-Restriction fragment length polymorphism (Carbone et al., Reference Carbone, Jordan, Lyman, Harbison, Leips, Morgan, DeLuca, Awadalla and MacKay2006), as well as allele-specific amplification of insertion deletion polymorphisms. Ultimately this resulted in a polymorphism dataset that contained on average 214 genotyped lines at each polymorphism. For the association mapping analysis, polymorphisms were used only if the minor allele frequency was at least 2%.
(v) Statistical analyses
(a) Linkage disequilibrium (LD)
Patterns of LD and significance tests were assessed with TASSEL (http://www.maizegenetics.net). LD was quantified as r 2=(P iiP jj−P ijP ji)2/p i(1−p i)p j(1−p j), where pi and pj are the allele frequencies at the ith and jth locus, respectively, and Pij are the haplotype frequencies at loci i and j. Significance of pairwise LD was determined using Fisher's exact test. Allele frequencies for each of the polymorphisms were estimated using PROC FREQ in SAS.
(b) Association mapping
Statistical associations between polymorphisms and percent recovery from chill coma were determined by a set of two-way ANOVAs. For each of the 79 polymorphisms, the variation among sex-specific line-means was explained using the following model: y=μ+ M+ S+ M× S+∊, where y is the sex-specific line-mean of percent recovery from chill coma, μ is the grand mean, M and S are the fixed effects of marker allele and sex, respectively and ∊ is the residual error. The terms of primary interest in an association analysis are the main effect of marker (M) and the interaction effect of marker-by-sex (M×S), as these correspond to the effect of overall or sex-specific allelic variation on the phenotype of interest. In our analysis, we tested 79 polymorphisms, thus to control the type I error at an experiment wise level of less than or equal to 0·05, we used the highly conservative Bonferroni correction. For 79 polymorphisms, our significance threshold was equal to P=0·00063 or –log(P)=3·2007.
In addition to investigating the individual main effects of markers, we also examined pairwise epistatic interactions among polymorphisms in the Smp-30 gene region shown to be associated with percent recovery from chill coma. To do this, the variation among sex-specific line-means was explained using the following model: y=μ+Mi +Mj +S+Mi ×Mj +Mi ×S+Mj ×S+Mi ×Mj ×S+∊, where y, μ, S and ∊ are identical to the analysis presented above, but now Mi and Mj are the fixed marker effects of significant polymorphisms identified above. In this epistatic analysis, the terms of primary interest are the interaction effects between the two markers (i.e. Mi ×Mj and Mi ×Mj ×S), as these correspond to the effect of overall or sex-specific epistatic effects on the phenotype of interest. This analysis of epistasis was applied only to markers with significant main effect. One limitation of the formal epistatic analysis is that to detect interactions among two biallelic markers, it is essential that all the four two-locus genotypes are represented in the dataset. Unfortunately, for polymorphisms in strong LD or close physical linkage this is not always the case. To deal with this problem, we also performed analyses that investigated the influence of haplotype variation on percent recovery in chill coma. To do this, we used a simple two-way ANOVA similar to the single marker analysis presented above, except rather than using the fixed effects of a marker and sex, we tested the fixed effects of haplotype and sex. Haplotypes were constructed by concatenating the two-, three- or four-way combinations of significant single markers into a haplotype variable. For the epistatic and haplotype analysis, we used a significance threshold of P=0·05.
(c) Molecular population genetic analyses
We used DnaSP Version 5.0 (Librado & Rozas, Reference Librado and Rozas2009) to evaluate patterns of molecular variation within Smp-30 and test for patterns of molecular evolution that are consistent with departures from neutrality. Molecular variation in Smp-30 was quantified by the nucleotide diversity based on the number of segregating sites (θS; Watterson, Reference Watterson1975), the average number of nucleotide differences between pairs of sites (θπ; Nei & Tajima, Reference Nei and Tajima1981), as well as the number of singleton sites (θη; Fu & Li, Reference Fu and Li1993). To test for departure from neutrality, we used Tajima's D (Tajima, Reference Tajima1989). Estimates of D were calculated for the entire gene as well as for sliding window intervals spanning the entire gene. For the sliding window analysis, we used 50 bp windows with a step size of 10 bp.
(i) Variation among lines
(a) Phenotypic variation
We observed abundant natural phenotypic variation in cold tolerance among 255 natural inbred lines sampled from the Raleigh, North Carolina farmer's market (Ayroles et al., Reference Ayroles, Carbone, Stone, Jordan, Lyman, Magwire, Rollmann, Duncan, Lawrence, Anholt and MacKay2009), by measuring the percentage of flies that recovered from chill coma during a 22 min observation period at room temperature after 3 h at 0°C (F 255,255=8·89; P<0·0001; Fig. 1, Supplementary Figure 1). Variation among the lines ranged from lines that were extremely susceptible to cold (0% recovered in the observation period) to lines that were extremely resistant to cold (100% recovered in the observation period), and all intermediate cold tolerance phenotypes (Fig. 1). However, on average 49·96% of the flies were recovered during the 22-min observation period, suggesting that our rapid percentage assay is near the 50:50 recovery point, which corresponds to the assay point with maximum statistical power.
A significant portion of this phenotypic variation was the result of sex-specific genetic effects, as demonstrated by the highly significant line-by-sex interaction term (F 255,1471=2·23; P<0·0001). Finally, consistent with the significant line and line-by-sex terms in the ANOVA among lines, most of this variation in the percentage of flies recovered from cold was the result of genetic effects, as established by large broad-sense heritability estimates from pooled analyses, H 2=0·707 (σG2=766·35; σP2=1083·77) and sex-specific analyses (H Male2=0·710; H Female2=0·705). In addition, the pronounced genetic effect was consistent with the large magnitude of the coefficient of genetic variation (CVG=55·40), relative to the coefficient of environmental variation (CVE=35·66). Although all three of these estimates of variation (H 2, CVG, and CVE) are high for Drosophila, they are consistent with other studies of chill-coma recovery time in D. melanogaster (Anderson et al., Reference Anderson, Hoffmann and McKechnie2005; Morgan & Mackay, Reference Morgan and Mackay2006).
(b) Molecular variation in the Smp-30 gene region
We sequenced approximately 2000 bp encompassing the Smp-30 gene region in this set of wild-derived inbred lines, including the exons, introns and 5′-UTR (5′-untranslated region) and 3′-UTR, and 750 bp upstream of the translation start site. The gene region was highly polymorphic, with 79 polymorphisms that had minor allele frequency (MAF) of at least 2% (Fig. 2; Supplemental Table 1). Seventy-two of these polymorphisms were SNPs, while seven were insertion–deletion (indel) polymorphisms. The majority of these polymorphisms we located in non-coding regions, including the putative 5′ regulatory region, introns, and the 3′-UTR (Fig. 2). Consistent with the large number of polymorphisms, the estimates of nucleotide diversity were θS=0·00976, θπ=0·01213 and θη=0·00986.
We calculated LD between all pairs of polymorphisms (Fig. 2). We observed significant LD among many polymorphisms; however, the actual association between alleles (i.e. the absolute values of LD) as measured by R 2 was quite small, with a few exceptions where markers in close physical proximity exhibited large R 2 values. Thus, the widespread significance of the LD is likely an effect of our large sample size rather than the presence of defined haplotype blocks. The majority of the significant LD was contained within an approximately 1000 bp block at the 5′ end of the gene and in an 800 bp block at the 3′ end of the gene (Fig. 2). Although there was some significant LD between these blocks, the overall frequency of significant LD declined with physical distance and thus LD was reduced between polymorphisms from the 5′ and 3′ ends of the gene.
(ii) Associations between phenotypic variation and molecular polymorphism
We tested for associations between cold tolerance and each of the 79 polymorphisms identified within the Smp-30 gene region. Figure 3 a summarizes the significance, on a −log(P) scale, of these single-marker analyses. We observed four markers that exhibited highly significant associations with cold tolerance; these include G-632A (P=0·00004), A-630G (P=0·00025), Del-603In,Del2 (P=0·00032) and C-11T (P=0·00002). All of these polymorphisms exceed the highly conservative experiment-wise Bonferroni correction (P=0·05/79=0·00063; red dashed line in Fig. 3 a). In addition, all of the markers are located in non-coding portions of the Smp-30 gene region; three of the polymorphisms (G-632A, A-630G and Del-603In,Del2) are located within 30 base pairs of one another and are in the putative 5′ regulatory region of the gene (Goto, Reference Goto2000), while the fourth (C-11T) resides in the first intron, 11 base pairs upstream of the translation start site (Fig. 3 a). Two of the significant polymorphisms (G-632A and C-11T) have skewed allele frequencies, with frequencies of the common allele equal to 0·83 and 0·92, respectively, while the other two polymorphisms (A-630G and Del-603In,Del2) have common allele frequencies near 0·50 (Supplemental Table 1).
The mean difference in cold tolerance between the homozygous genotypes for the significant polymorphisms ranged from −12·29% to 20·86% (Table 1). However, the mean phenotypic differences attributable to each polymorphism should be interpreted with caution because the polymorphisms are not independent, due to LD (Fig. 2). Nearly all of the significant polymorphisms were in strong LD with each other (P<0·0001; Fisher's exact test; Fig. 2). The single exception was the marker pair A-630G and C-11T, which exhibited significant but a less robust level of LD (P<0·01; Fig. 2). Regardless of the lack of independence caused by LD among the significant polymorphisms, on a striking pattern that emerges from this data is the finding that two of the significant polymorphisms (G-632A and A-630G), which are located only 2 bp apart in the 5′ putative regulatory region of Smp-30 (Goto, Reference Goto2000), have opposite effects on chill-coma recovery (Table 1). Marker G-632A has a difference in chill–coma recovery between the homozygous alleles [G (p=0·83); A (q=0·17)] of 16·44%, while marker A-630G has a difference in chill–coma recovery between the homozygous alleles [A (p=0·56); G (q=0·44)] of −11·04%. This pattern is noteworthy because it is unlikely that a pair of polymorphisms in such a close physical linkage would have opposite effects on the same trait unless there was significant balancing selection acting to maintain the pair of antagonistic polymorphisms.
a Marker names correspond to the common allele, the position in base pairs relative to the ATG start site (i.e. the beginning of exon 2), prefixed by the common allele and followed by the minor allele. In cases where three allelic classes were discovered, the minor alleles are listed in order of descending allele frequency.
b Minor allele frequency.
c The sample size (N, number of lines) used in the ANOVA.
d Mean difference in cold tolerance (% recovery) between homozygous genotypes (common homozygote – rare homozygote).
The three significant polymorphisms in the 5′ putative regulatory region of Smp-30 (Goto, Reference Goto2000) were in strong LD with one another and formed only four haplotypes (Fig. 4 a). This three-marker haplotype was significantly associated with percent recovery from chill coma (F 3,390=7·48; P<0·0001; Fig. 4 a). To test for epistastis among the significant polymorphisms, we separated the polymorphisms into two regions: region 1 (Fig. 4 a) contained the set of three physically linked markers (i.e. G-632A, A-630G and Del-603In,Del2) in the 5′ putative regulatory region of Smp-30 (Goto, Reference Goto2000), while region 2 was the C-11T marker (Fig. 4 b). We tested for epistasis between the three possible pairwise combinations of markers in region 1 and the single marker in region 2 and did not detect any significant epistatic interactions in any analysis (Fig. 4 c, d). This can be easily seen by the lack of significant interaction effects between markers: G-632A by C-11T (F 1,390=0·02; P=0·8947; Fig. 4 c), Del-603In,Del2 by C-11T (F 1,408=0·79; P=0·3739; Fig. 4 d) and A-630G by C-11T (F 1,390=0·44; P=0·5071; graph not shown).
(iii) Molecular population genetics of Smp-30
We tested for evidence of non-neutral molecular evolution by estimating Tajima's D (Tajima, Reference Tajima1989). Tajima's D was estimated using the set of 137 complete sequences for the entire gene region and with an analysis of 50 bp sliding windows (Fig. 3 b). Tajima's D over the entire gene region was positive (D=0·6752), but not significantly different from zero. However, when we performed a sliding window analysis, we identified three regions where Tajima's D is significantly positive (Fig. 3 b), indicating a pattern of molecular variation consistent with the maintenance of excess variation by balancing selection. The region with the most significant Tajima's D values occurred in the putative 5′ regulatory region of the gene (Goto, Reference Goto2000) between sites −581 and −650 (Fig. 3 b). Interestingly, this region overlaps the region containing three polymorphisms (G-632A, A-630G and Del-603In,Del2) that are significantly associated with phenotypic variation in cold tolerance (Fig. 3 a; Table 1), which are in strong LD, but that have phenotypic effects that are in opposite directions based on differences between the homozygous genotypes at each polymorphism. The common allele at marker G-632A increases the percentage of flies recovered from chill coma, while the common allele at markers A-630G and Del-603In,Del2 decrease the percentage of flies recovered from chill coma.
One of the primary goals of evolutionary genetics is to identify the causal allelic variants that contribute to phenotypic adaptation in nature (Lewontin, Reference Lewontin1974; Hoekstra & Coyne, Reference Hoekstra and Coyne2007). In Drosophila, cold tolerance is a major climatic adaptation that has been documented by the presence of phenotypic clines on multiple continents (Gibert et al., Reference Gibert, Moreteau, Petavy, Karan and David2001; Hoffmann et al., Reference Hoffmann, Anderson and Hallas2002; Ayrinhac et al., Reference Ayrinhac, Debat, Gibert, Kister, Legout, Moreteau, Vergillno and David2004; Schmidt & Paaby, Reference Schmidt and Paaby2008) and also significantly influences the divergence and climate matching among Drosophila species (Gibert & Huey, Reference Gibert and Huey2001; Gibert et al., Reference Gibert, Moreteau, Petavy, Karan and David2001). As a result of these recently documented adaptive patterns, many regions of the genome have also been identified that influence cold tolerance in Drosophila, including the candidate gene Smp-30 (Goto, Reference Goto2000; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007), which encodes a protein product involved in Ca2+ ion homeostasis and signalling. However, to understand cold adaptation in nature, we must link polymorphisms in functionally important loci like Smp-30 with ecologically relevant natural phenotypic variation. To date, no study has performed a detailed analysis of sequence variation within Smp-30 and cold tolerance variation within a natural population (but see Rako et al., Reference Rako, Blacket, McKechnie and Hoffmann2007) to identify the segregating allelic variants influencing cold tolerance in nature.
Our study confirms that Smp-30 is a cold tolerance gene (Goto, Reference Goto2000; Qin et al., Reference Qin, Neal, Robertson, Westwood and Walker2005; Sinclair et al., Reference Sinclair, Gibbs and Roberts2007) and more importantly that there is a set of four polymorphisms (G-632A, A-630G, Del-603In,Del2 and C-11T), three of which are in the putative 5′ regulatory region of the gene (Goto, Reference Goto2000) and one that is just upstream of the ATG start site, which are strongly associated with variation in cold tolerance. All of these polymorphisms are at appreciable frequencies within our population (MAF⩾0·08) and all were in significant LD with one another. This significant LD makes it difficult to disentangle the individual influence of each polymorphism on variation in cold tolerance. However, if we use the most conservative (i.e. the smallest; A-630G) estimate for the difference in the cold tolerance between homozygous marker classes, we observe a change of 11% between the homozygous genotypes, which is a dramatic effect on the cold tolerance.
LD throughout the 2-kb Smp-30 gene region obstructs our ability to identify the quantitative trait nucleotides (Long et al., Reference Long, Lyman, Langley and Mackay1998; Carbone et al., Reference Carbone, Jordan, Lyman, Harbison, Leips, Morgan, DeLuca, Awadalla and MacKay2006) responsible for the observed variation in cold tolerance. Thus, it is possible that G-632A, A-630G, Del-603In,Del2 and C-11T are not the causal polymorphisms, but rather are in LD with other true causal regulatory polymorphism upstream of these markers. This is because markers G-632A, A-630G and Del-603In,Del2 all reside within the first 200 base pairs of our sequencing reads; therefore it is possible that by only sequencing 750 base pairs upstream of the ATG start site we are missing true regulatory polymorphisms upstream of these strongly associated markers. Future studies should extend the sequencing sufficiently far in the 5′ and 3′ directions outside of the coding sequence to observe the ultimate decay of LD between our significant polymorphisms and other possible unidentified causal polymorphisms further upstream. One other study has evaluated the association between a variation in 5′ regulatory region Smp-30 and natural variation in cold tolerance in an Australian population (Rako et al., Reference Rako, Blacket, McKechnie and Hoffmann2007). In this study, Rako et al. (Reference Rako, Blacket, McKechnie and Hoffmann2007) used a pair of primers that span our three significantly associated polymorphisms, G-632A, A-630G and Del-603In,Del2, but found no significant association between their four different length variant (330–342 bp) alleles. Rako et al. (Reference Rako, Blacket, McKechnie and Hoffmann2007) used a somewhat atypical association analysis experimental design; however, they were able to detect multiple associations between other temperature stress genes and thermal phenotypes, suggesting that their approach was robust. Thus, in order to evaluate disagreement between our studies we placed their reverse primer sequence (5′-TTAAAAGGTATAAAACAATCAACACTG) on our alignment (at position −491). We were not able to place their forward primer sequence as it occurs approximately 100 bp upstream of the start of our sequence reads. By scanning the region flanked by their reverse primer and the start of our sequence reads, we were able to construct four plausible length variant alleles from our sequence data. Although we cannot definitely identify the molecular basis of Rako et al.'s (Reference Rako, Blacket, McKechnie and Hoffmann2007) 12 bp length variation (i.e. 330–342 bp) PCR polymorphism, we did observe two indel polymorphisms within the region that would have been amplified by their primers. The first 3 bp indel (Del-702In) was not associated with cold tolerance, while the second 9–10 bp indel (Del-603In, Del2) was significantly associated with cold tolerance (Table 1; Fig. 3). If we combine the length variation attributable to each of these indel markers and create a composite polymorphism, we generate six possible length variant alleles, because marker Del-603In,Del2, has a common 9 bp deletion allele, an insertion allele and a rare 10 bp deletion allele (Supplemental Table 1). Thus the alleles at this composite marker in order of length from shortest to longest are 13, 12, 10, 9, 3 and 0 bp deletion alleles. Four of these alleles are present in our data (13, 12, 9 and 3 bp). If we repeat our association analysis and test for the association between this composite length polymorphism and percent recovery from chill coma, we observe a nominally significant association [F 3,310=3·90; P=0·0093; –log(P)=2·03]; this does not exceed our multiple testing criteria [P=0·0006; –log(P)=3·20; Fig. 3]. Thus, this analysis of our composite length variant allele suggests that the lack of association observed by Rako et al. (Reference Rako, Blacket, McKechnie and Hoffmann2007) is likely not in disagreement with our data. However, by decoupling this length variation by direct sequencing, we are able to detect a significant association between cold tolerance and marker Del,-603In,Del2 as well as two other SNPs (G-632A and A-630G) which could not be detected via the PCR product length polymorphism.
This study shows that Smp-30 is associated with natural phenotypic variation in cold tolerance; however, the most complete picture of this association emerges when the results of the association, LD, and molecular evolutionary analyses are considered together. The presence of multiple highly significant polymorphisms in close proximity to one another (2–30 bp apart), which are also in strong LD with one another, but display opposite phenotypic effects (i.e. G-632A and A-630G) suggests that the allelic variation at these polymorphisms is likely being maintained by balancing selection. This argument becomes even more compelling when it is coupled, the Tajima's D analysis that identifies this same region as a region of the gene with the maintenance of an excess variation (i.e. a positive Tajima's D) suggesting that this putative 5′ regulatory region (Goto, Reference Goto2000) between sites −581 and −650 is not only significantly associated with natural phenotypic variation in chill coma but is also being maintained by balancing selection. These results suggest that multiple mutations in non-coding regions can have significant effects on phenotypic variation on adaptive traits within natural populations, and that balancing selection can maintain physically linked polymorphisms with opposite effects on phenotypic variation in nature.
We thank M. U. Nasser, S. Menon and L. H. Duncan for assistance with flies used in this study and to L. C. Fallis for comments on a previous version of this manuscript. This work was supported by grants from the Kansas State University Ecological Genomics Institute and the Arthropod Genomic Center to TJM, The Kansas State University McNair Scholar's Program to KJC and National Institutes of Health Grant GM-45146 to TFCM.