Human scalp hair has a notoriously variable morphology. The shape and degree of the curl in hair fibers are associated with the number of neocortical cells and the distribution and types of keratins (Thibaut et al., Reference Thibaut, Barbarat, Leroy and Bernard2007). Previous research has shown that hair follicle morphogenesis is affected by a complex interaction between growth and homeostasis hormone pathways, as well as regulatory pathways (Vogt et al., Reference Vogt, McElwee, Blume-Peytavi, Blume-Peytavi, Whiting and Trüeb2008). The basic structure of hair (cuticle, cortex and medulla) is common to all humans; however, the dimensional shape of the hair fiber and ultrastructure varies considerably among and within populations (Koch et al., Reference Koch, Shriver and Jablonski2019; Loussouarn et al., Reference Loussouarn, Garcel, Lozano, Collaudin, Porter, Panhard and de La Mettrie2007).
The heritability of self-reported hair characteristics has typically been found to be high; hair color was reported to show a 73−99% broad-sense heritability in a Dutch sample (Lin et al., Reference Lin, Mbarek, Willemsen, Dolan, Fedko, Abdellaoui and Hottenga2015), while hair curliness was found to be 85−95% heritable in an Australian cohort (Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009). Moreover, a previous analysis using the same sample utilized in this present study showed that hair curvature is a heritable trait with strong genetic influence (h 2 = .52−.64; Ho et al., Reference Ho, Brims, McNevin, Spector, Martin and Medland2016).
Previous genome-wide association studies (GWAS) discovery and meta-analyses have identified 12 loci (located near the trichohyalin [TCHH], ectodysplasin A receptor [EDAR], WNT10A, FRAS1, OFCC1, TRAF2, PRSS53, PADI3, LOC105373470, S100A11, LCE3E and LOC391485 genes) associated with hair morphology and structure (Adhikari et al., Reference Adhikari, Fontanil, Cal, Mendoza-Revilla, Fuentes-Guajardo, Chacon-Duque and Ruiz-Linares2016; Fujimoto et al., Reference Fujimoto, Kimura, Ohashi, Omi, Yuliwulandari, Batubara and Tokunaga2008; Liu et al., Reference Liu, Chen, Zhu, Hysi, Wu, Adhikari and Kayser2018; Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009; Pospiech et al., Reference Pospiech, Chen, Kukla-Bartoszek, Breslin, Aliferi and Andersen2018; Westgate et al., Reference Westgate, Ginger and Green2017). These studies found that the TCHH gene has a major effect on hair morphology in European and Latin American populations, but no effect in East Asian populations (Eriksson et al., Reference Eriksson, Macpherson, Tung, Hon, Naughton, Saxonov and Mountain2010; Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009). By contrast, the EDAR gene, on chromosome 2, has a strong association and high prediction accuracy for straight hair in East Asians (Tan et al., Reference Tan, Yang, Tang, Sabeti, Jin and Wang2013) and a minor (less than 0.2%) effect on hair structure in Europeans (Liu et al., Reference Liu, Chen, Zhu, Hysi, Wu, Adhikari and Kayser2018).
The studies cited above utilized macro-level categorical measures of hair curvature (e.g., straight, wavy or curly), assessed via self-report or researcher classification. While these studies demonstrated that simple classification schemes can be used to identify genetic variants influencing hair curvature, the present study sought to assess whether the use of micro-level quantitative assessments of hair curvature would increase the power to identify additional genomic loci influencing this trait. A GWAS utilizing digital quantification of continuous variation in human eye color found three new loci in comparison to the use of categorical eye color measures in the same individuals (Liu et al., Reference Liu, Wollstein, Hysi, Ankra-Badu, Spector, Park and Kayser2010). This indicates that using quantitative measures may be one way of finding novel genetic variants associated with human morphological traits.
With the aim of identifying additional loci associated with hair morphology using a quantitative measure, we follow-up on an earlier study by our group that evaluated the heritability and genetic correlations of a quantitative micro-level measure of hair curvature in a sample of twins and siblings of European descent (Ho et al., Reference Ho, Brims, McNevin, Spector, Martin and Medland2016). In the current study, we evaluate whether a GWAS of the quantitative micro-level hair curvature measure replicates the GWAS results based on observational macro-level measures of hair morphology in the same sample.
Details of the sample used in this study are reported in Ho et al. (Reference Ho, Brims, McNevin, Spector, Martin and Medland2016). Briefly, measures of hair curvature were obtained for 3057 twins and their siblings, within the context of the Brisbane Longitudinal Twin Study (BLTS; Wright & Martin, Reference Wright and Martin2004). The BLTS is an ongoing study initiated in 1992 in Brisbane, Australia. Its initial focus was research on melanoma; however, the study has collected information on a wide range of traits and, over the years, many parallel studies have been derived from it. Participants in the BLTS provide information on many traits such as height, skin color and wellbeing. Participants also provided a blood sample that is used for biochemical and DNA analyses. For hair studies, clippings of 2−4 cm were taken from the scalp of participants.
Written informed consent was obtained from participants and their parents or guardians. Ethics approval for the study was obtained from the QIMR Berghofer Medical Research Institute Human Research Ethics Committee.
Quantitative measure of hair curvature
The quantitative micro-level measures of hair curvature were obtained using an Optical Fibre Diameter Analyzer (OFDA 2000, BSC Electronics, Myaree, WA, Australia). The OFDA is an industry-standard technology used for measuring wool fiber diameter and quality (Wortmann & Schwan-Jonczyk, Reference Wortmann and Schwan-Jonczyk2006). For this study, we used hair segments of 200 µm in length. The OFDA measured the angle of curvature of the hair in degrees; this measure was multiplied by 5 to represent the angle of a 1 mm segment of the hair strand. The data provided by the OFDA was the number of snippets (hairs) per bin (with bins ranging from 0 to 360°). In a previous publication (Ho et al., Reference Ho, Brims, McNevin, Spector, Martin and Medland2016), we show that the distribution of this data is approximately log-normal, therefore for this study, the data were log-normalized. Measurements beyond three standard deviations above and below the mean were removed from analyses.
Observational measure of hair curvature
The observational macro-level measure of hair curvature was conducted by trained research nurses who scored participants’ hair according to a 3-point scale: straight, wavy or curly. The data were collected at up to three time points for each individual included in the study, at ages 12, 14 and 16 years. For the present study, we used the data collected at age 12.
Structural equation modeling (SEM)
To determine possible common genetic effects underlying the correlation between the quantitative and observational measures of hair curvature, we used a bivariate Cholesky twin model (Neale & Cardon, Reference Neale and Cardon1992). Previous univariate twin models for this cohort have found that the genetic variance of hair curvature is better explained by a model that includes additive, dominant and environmental components (ADE; Ho et al., Reference Ho, Brims, McNevin, Spector, Martin and Medland2016; Medland, Zhu et al., Reference Medland, Zhu and Martin2009). In the current study, we started by fitting a bivariate Cholesky ADE model without sex limitation. Simplified AE and E models were tested for goodness of fit to determine whether parameters could be dropped from the full model. The change in model fit was tested using likelihood ratio chi-square tests. The distribution of twice the difference in log-likelihood (−2LL) approximates Δχ2, with the degrees of freedom (df) for this test equal to the difference in df between the models. SEM analyses were performed using the R package OpenMx version 2.6.7 (Boker et al., Reference Boker, Neale, Maes, Wilde, Spiegel, Brick and Fox2011; Neale et al., Reference Neale, Hunter, Pritkin, Zahery, Brick, Kirkpatrick and Boker2016).
Genotyping and GWAS
Genotyping data were available for 2975 individuals (twins and siblings) with hair measurement data. After quality control, data for 2225 individuals were included in the GWAS analyses. Participants were previously genotyped using the Illumina 610K genotyping array and imputed to the Haplotype Reference Consortium reference (McCarthy et al., Reference McCarthy, Das, Kretzschmar, Delaneau, Wood and Teumer2016, Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009). Genome-wide association analyses were conducted using RARERMETALWORKER (Feng et al., Reference Feng, Liu, Zhan, Wing and Abecasis2014). This program assesses the evidence for association at each variant using the imputed dosage format, parameterized as x ij = βdose + βsex + βage + µ where j denotes the trait value for an individual within family i, while accounting for familial relatedness and zygosity. Univariate GWAS analyses were performed for each measure. Sex and age were included as covariates in both GWAS. Following each GWAS analysis, we applied standard quality control procedures using thresholds of quality score R 2 > .5 and minor allele frequency (MAF) > 0.05, including only those single-nucleotide polymorphisms (SNPs) reaching quality control thresholds in subsequent analyses (N SNP = 7,533,529).
Conditional association analyses on the most significant SNP for each GWAS was performed with COJO, a utility included in the GCTA software package (Yang et al., Reference Yang, Lee, Goddard and Visscher2011). The conditional analysis determines whether the significant signals from multiple SNPs in the same genomic region were due to high linkage disequilibrium (LD) between SNPs (i.e., SNPs in high LD representing one association signal) or due to the presence of multiple causal variants. GWAS p values were used to perform gene-based association analyses with fastBAT (Bakshi et al., Reference Bakshi, Zhu, Vinkhuyzen, Hill, McRae, Visscher and Yang2016). This analysis takes into account the aggregated effect of a set of SNPs defined by the degree of LD between them. The 1000 Genomes Build37 (hg19; 1000 Genomes Project Consortium et al., Reference Auton, Brooks, Durbin, Garrison, Kang and Abecasis2015) was used as a reference genome for both the conditional analyses and gene association analyses. Bioinformatic analyses to investigate possible functional effects of the most likely candidate SNPs were conducted using FUMA (Watanabe et al., Reference Watanabe, Taskesen, van Bochoven and Posthuma2017), SIFT (Schneider et al., Reference Schneider, Hu, Sim, Kumar, Henikoff and Ng2012) and PolyPhen-2 (Adzhubei et al., Reference Adzhubei, Schmidt, Peshkin, Ramensky, Gerasimova, Bork and Sunyaev2010).
The distributions of quantitative measurements and the observational measures of straight, wavy and curly hair are presented in Figure 1. Means were not significantly different (p > .05), and boxplots whiskers overlap between categories (M Straight = 0.68 SD = 0.07, M Wavy = 0.71 SD = 0.07, M Curly = 0.80, SD = 0.10).
Results for the bivariate ADE decomposition are presented in Table 1. The goodness of fit of the model did not decrease significantly if the component D was omitted from the model (p = .43). In contrast, the fit changed significantly when A and D were dropped (p < .001). The AE model was adopted as the best fit. Path coefficients from the AE model are presented in Figure 2. The estimated genetic variance suggests a very strong genetic component for hair curvature, 90% for the observational measures, and 58% for the quantitative measurements (Table 1). The cross-trait genetic correlations from the AE model were rg A = .41 (95% CI [0.35, 0.46]).
a Variance was decomposed into additive (A), no-additive (D) and unique environmental (E) factors for the quantitative micro-level and observational macro-level measures of hair curvature. The 95% confidence intervals are given within square brackets, with the best fitting model in bold.
b –2LL minus twice the log-likelihood of the model, df, degrees of freedom; Δdf, difference of degree of freedom; Δχ2, change in chi-squared; AIC, Akaike information criterion.
Genome-wide Association Analysis
The GWAS of quantitative hair curvature measurements resulted in one genome-wide significant (defined as p < 5.0 × 10−8) association signal on chromosome 1q21.3 (Figure 3A; Supplementary Table S1). This signal is represented by five significantly associated SNPs (lead SNP rs12130862, β= −0.16, p = 9.5 × 10−09) that are in very high LD with each other (minimum r 2 between SNPs = .93; Supplementary Table S2).
The GWAS of observed measures of hair curvature also resulted in one genome-wide significant association signal, also on chromosome 1q21.3 (Figure 3B; Supplementary Table S1), with 12 significantly associated SNPs (lead SNP rs11803731, β = −0.24, p = 2.1 × 10–17) in moderate to very high LD with each other (minimum r 2 = .45; Supplementary Table S2). This result replicates previous findings (Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009) of the influence of rs11803731 on observable hair curvature. The two lead SNPs from the current analyses, rs12130862 and rs11803731, are in very high LD with each other (r 2 = .95). Association analyses conditioning on the top SNPs for each of the hair curvature measures revealed no additional significant SNPs for either measure at this locus, indicating these SNPs represent one association signal.
We plotted the effect sizes for the top 1000 SNPs (Figure 4) for both GWAS. The effect sizes correlate significantly (r 2 = .815, p < .01).
Gene-based analyses of both hair curvature GWAS supported the individual SNP findings of significant association signal (with a Bonferroni-corrected gene-based p < 1 × 10−6 to account for tests on 24,036 genes) on chromosome 1q21.3. For the observational measure, there were six significantly associated genes: TCHH, S100A11, TCHHL1, NBPF18p, RPTN and S100A10. All six genes are located within a region of approximately 400 Mb on chromosome 1q23.1. For the quantitative measure, there were no significant gene associations: the highest association was for TCHH (p = 1.0 × 10−5; Table 2, Figure 5).
Note: fastBat gene association analysis of quantitative micro-level and observational macro-level measures of hair curvature. Gene, location in chromosome (Chr) and starting position of the gene in base pairs (bp), number of SNPs associated with each gene, gene association p value and top SNP p value. SNP, single-nucleotide polymorphism.
The SNP rs11803731 is an exonic SNP that would result in the replacement of the amino acid lysine by methionine in the TCHH protein (p.L790M). This change is predicted to be ‘damaging’ by SIFT and ‘probably damaging’ by PolyPhen, and it has previously been suggested that rs11803731 could be the functional SNP at this locus (Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009). Functional analyses indicate that rs11803731 and SNPs in high LD with rs11803731 are also eQTLs that influence the expression of other nearby genes, including the hornerin (HRNR) and thioesterase superfamily member 4 (THEM4) genes in human skin (Figure 3C and 3D, Supplementary Tables S2 and S3).
The curvature of human hair has been shown to be a highly heritable trait. While previous association studies have identified some genetic variants influencing hair curvature using observational macro-level measures, this is the first study to use quantitative micro-level measures of hair curvature.
We first reconfirmed the moderated cross-trait genetic correlation (rg A = .41) between the quantitative and observational measures of hair curvature. The estimates of heritability for each measure (quantitative = 57%, observational = 95%) agree with previous estimations of the heritability of hair shape (Ho et al., Reference Ho, Brims, McNevin, Spector, Martin and Medland2016; Medland, Zhu et al., Reference Medland, Zhu and Martin2009).
The comparison of the quantitative and observational measures of hair curvature suggests that the quantitative assessment of curvature provided by the OFDA analysis is a good predictor of hair curvature categorized by observation alone. High correlation between the quantitative and observational effect sizes (Figure 4) shows these are predictive of each other, indicating that quantitative measures are comparable to observational measures in the detection of genetic variants predicting hair curvature.
Genome-wide association analyses for the two measures resulted in one genome-wide significant signal on chromosome 1q21.3. Association between variants at 1q21.3 and hair shape was previously reported by Medland, Nyholt et al. (Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009) and reconfirmed by a meta-analysis that evaluated a quantitative measure of hair curvature in 28,964 individuals that included cohorts of non-European ancestry (Liu et al., Reference Liu, Chen, Zhu, Hysi, Wu, Adhikari and Kayser2018). In the current analysis, a set of five SNPs that includes rs11803731 and rs12130862, in high LD with each other and shown to represent the same signal in conditional GWAS analyses, surpassed the genome-wide significant threshold for both GWAS. These five SNPs are considered as the most likely candidate causal SNPs driving the association signal at the 1q21.3 locus, and are located in the region of the TCHH and TCHHL1 genes. In the gene-based analyses, both genes were significantly associated with observational measures of hair curvature. TCHH and TCHH1 encode the TCCH and TCCH-like 1 proteins, respectively. TCCH binds to keratins and is expressed in the inner root sheath of the hair follicle and in the hair strand. SNP rs11803731, a nonsynonymous SNP that replaces a lysine with a methionine in the TCHH protein, was previously associated with the straightness of hair (Medland, Nyholt et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009). Our bioinformatic analysis also indicates rs11803731, rs12130862 and other SNPs in LD with these influence the expression of other nearby genes, including HRNR expressed in the epidermis and with a suggested role in keratinocyte differentiation (Henry et al., Reference Henry, Hsu, Haftek, Nachat, de Koning, Gardinal-Galera and Simon2011). Further studies are, however, required to determine whether these additional genes have roles in influencing hair morphology, and TCHH remains the best candidate gene in this chromosomal region. In sheep, genetic variation in the TCHH gene, as well as differing mRNA expression levels over the lifetime, are known to influence wool straightness in various sheep breeds (Gong et al., Reference Gong, Zhou, Tao, Li, Dyer, Luo and Hickford2018, Jie et al., Reference Jie, Yufang, Lu and Ming2019). Several studies of human hair have also indicated a role for TCHH in the biology and evolution of straight hair (Koch et al., Reference Koch, Shriver and Jablonski2019) and in the straightening of the hair follicle (Steinert et al., Reference Steinert, Parry and Marekov2003, Wu et al., Reference Wu, Latendorf, Meyer-Hoffert and Schröder2011).
The extent of support for the TCHH signal was stronger for the observational measure than for the quantitative measure of hair curvature. We hypothesize that this is due to either the difference between the macro (observational) and micro (quantitative) scales of measurement, or to the sampling of only a relatively small number of hairs taken from a single area on the head for the quantitative measure. Nevertheless, the quantitative measure GWAS did detect the signal previously observed at the TCHH locus, matching the results of the observational measures in the same individuals and serving as a positive control to reconfirm the accuracy of observational measures.
To view supplementary material for this article, please visit https://doi.org/10.1017/thg.2020.78.
We would like to thank all twins and their families who participated in the BLTS and provided their valuable genotype and phenotype measurements, as well as nurses who participated in the categorical coding of hair curvature. Additional thanks are due to Mark Brims for making available hair measurements from the OFDA.
Funding was provided by the Australian Research Council (ARC LP110100121), the Australian National Health and Medical Research Council (NHMRC), Visigen, Identitas Inc., Australian Federal Police Forensics, Victoria Police Forensic Services Department and Unisys Australia. Yvonne Ho was supported by an Australian Government Research Training Program Scholarship.
Conflict of Interest