A Short History
The Chinese National Twin Registry (CNTR), established in 2001, is the first national twin registry in China (Yang et al., Reference Yang, Li, Cao, Lu, Wang, Zhan and Li2002), led by the School of Public Health, Peking University, which collaborates with the Qingdao Center for Disease Control and Prevention (CDC), Dezhou CDC, Zhejiang CDC, Jiangsu CDC, Sichuan CDC, Beijing CDC, Shanghai CDC, Tianjin CDC, Qinghai CDC, Heilongjiang agricultural area CDC, Handan CDC, Yunnan CDC and Harbin Medical University. Professor Liming Li from Peking University is the principle investigator.
The first financial support was from an independent American foundation, the Rockefeller-endowed China Medical Board, from 2001 to 2005. At the very beginning, the CNTR selected Qingdao, Beijing, Shanghai and Lishui as the first four cities to recruit twins. In 2006, an article introducing the CNTR was published in a special issue of twin registries (Li et al., Reference Li, Gao, Lv, Cao, Zhan, Yang and Hu2006). In 2010, the CNTR obtained a grant from the Chinese government (the Special Fund for Health Scientific Research in Public Welfare, 201002007), which expanded the twin registry from four areas to nine provinces or cities (p/c; Li et al., Reference Li, Gao, Yu, Lv, Cao, Zhan and Hu2013). From 2015 to 2018, the CNTR continued to receive Chinese government funding (201502006) and had recruited 61,566 twin-pairs (including multiple sets) in 11 p/c by February 2019. The demographic information on twins is presented in Table 1. The CNTR has now become the largest twin registry in Asia. The CNTR recently celebrated its 20th anniversary on January 19, 2019.
Note: MZ = monozygotic twins, DZ = dizygotic twins, M = male, F = female, O = opposite sex. Others include twin-pairs/multiple sets with gender or PPQ information missing, or twins who answered PPQ question with ‘I have no idea’.
The CNTR is a voluntary registry. The twins are mainly identified and recruited through the local CDC, which shares data with the residence registry in local public security bureaus and communities, or through public media. The CNTR collects twin data through face-to-face interviews with investigators from the CDC, which covers all levels of province, city and county in China. The CNTR also recruits twins through the Chinese ‘Hukou’ system (an ID schema), which is administered by the public security bureau. Data from the CDCs and the public security bureau are used to identify twins, which are verified by health workers. In addition, advertisements in print or online media are also used. The study protocol for the Special Cohort Study on Environmental Epidemiology in China was reviewed and approved by the Ethics Committee for Human Subject Studies of the Peking University Health Science Center in 2014 (ID: IRB00001052-14021).
We used the Peas in the Pod Questionnaire (PPQ) for twin zygosity assessment when genotyping information is not available. A total of 1008 twin-pairs recruited in 2001 was assessed by genetic test. In China, according to our own comparison studies, the accuracy of twin diagnosis can reach 86–90% (Li et al., Reference Li, Gao, Yu, Lv, Cao, Zhan and Hu2013). There were no statistically significant sex and area differences in the validity of the questionnaire and physical features comparison-based classification. Therefore, questionnaire-based zygosity assessment in this Chinese adult twin sample can still be regarded as a valid and valuable classification method. Physical features comparisons, however, can only provide limited information for zygosity determination (Gao et al., Reference Gao, Li, Cao, Zhan, Lv, Qin and Hu2006).
As for twins for whom blood samples are available, we use several methods, including blood group typing, analysis of four or nine short tandem repeat genetic markers, single-nucleotide polymorphisms (SNPs) using the Illumina HumanOmniZhongHua-8 BeadChip, SNP information from the Illumina Infinium Human Methylation 450 BeadChip and MethylationEPIC ‘850K’ BeadChip array. In our study, we recruited 192 same-sex Chinese adult twin-pairs to evaluate the validity of using the genetic marker-based method and questionnaire-based method for zygosity determination. We considered the relatedness analysis based on more than 0.6 million SNP genotyping as the gold standard for zygosity determination. The results of zygosity determination based on 65 SNPs in 450k methylation array were all consistent with genotyping. For cost considerations for twin studies with genotyping and/or 450k methylation array, there is no need to conduct other zygosity testing (Wang et al., Reference Wang, Gao, Yu, Cao, Lv, Wang and Li2015).
The twin data were collected through face-to-face interviews with investigators and questionnaires that were completed by interviewees. The CNTR twins were asked to complete one of the two questionnaires: a simple questionnaire for twins under 18 years and a complex questionnaire for adults ≥18 years. The first questionnaire for twins aged below 18 years collected demographic information, parents’ names, birth weight, current weight and height, medical history and zygosity. Adult twins aged 18 years and over completed a more complex questionnaire that includes questions on demographic information, socioeconomic status, birth weight, birth defects, birthplace, whether reared apart, current height, weight, waist circumference, zygosity, smoking, drinking, fruit and vegetable consumption, physical activity, medical history, allergic history and family medical history. Twins are currently followed up and their contact information, height, weight and waist measurements, and the onset of new chronic diseases are updated when possible.
The CNTR Biobank
Fasting blood samples from 1008 twin-pairs, regardless of their illness, were collected during 2001–2002. Only 579 twin-pairs were followed up during 2004–2005. Basic biochemical tests and DNA extraction were performed. From 2010, we collected further blood samples from 1196 disease-discordant pairs and 577 disease-concordant pairs. The number of twin-pairs who provided a blood sample is presented in Table 2. Serum lipids, glucose, glycosylated hemoglobin, insulin, high-sensitive C-reactive protein and creatinine were tested in these serum samples.
Note: Obesity: BMI ≥ 28 kg/m2 based on height and weight measured. Other diseases are based on self-reported diagnosis by county-level hospital or above, or taking therapeutic drugs for hypertension/diabetes/hyperlipidemia/cancer. Twins reared apart: being separated at least 1 year before 11 years old.
1 Includes both same-sex and opposite-sex twin-pairs.
Genotyping and Quality Control
A total of 480 twins of DNA samples were assessed for integrity, quantity and purity by electrophoresis and NanoDrop measurements. A genomewide genotyping scan among 240 pairs was carried out using Illumina HumanOmniZhongHua-8 BeadChip. High-quality genotyping was performed by laboratory specialized in Illumina SNP array genotyping following standard experimental procedures suggested by the manufacturer. A total of 894,956 SNPs was genotyped among 240 paired subjects. Genomewide heterozygosity of each individual was estimated to exclude cross-contamination between samples. This test was performed using a subset of 155,588 SNPs pruned for linkage disequilibrium (r 2 < .3). Two samples demonstrated signs of contamination (mean observed heterozygosity = .3404, standard deviation = .0052), and those two samples and their twin siblings were excluded. Finally, 180 pairs of twins with 695,406 SNPs remained after the quality control filters.
Genomewide Methylation Profiling
Overall, 118 monozygotic (MZ) and 97 dizygotic (DZ) twin-pairs were tested using Illumina 450k methylation array, and 87 MZ and 51 DZ twin-pairs using Illumina 850k methylation array in the CNTR. DNA methylation level was displayed as beta-values ranging from 0 to 1. Beta-value was defined as the algorism M/(M + U + 100), where M and U represent the methylated and unmethylated signal intensities, respectively. For the methylation quality control, we used the 65 SNPs to determine the genotype of the sample and compare it with genotype calls based on Zhonghua8 Beadchip data, to identify if there were some mix-ups during the methylation experiment. Sample-level and probe-level quality control showed that all samples passed the Illumina quality control. About 1% of sites with a detection p value greater than .01 were removed (zero sample). Sites having 1% of samples with a detection p value greater than .01 (4019 sites) or sites with beadcounts <3 in 5% of samples (1499 sites) were removed. In addition, sites with SNPs or with a minor allele frequency of at least 5% were excluded, because probe binding might be affected by SNPs in the binding area. Finally, a total of 817,471 probes that passed the quality control were included.
Distribution of Discordant or Concordant Twin-Pairs
The co-twin control study design involves twins discordant for specific exposures such as environmental factors, and twins discordant for disease outcomes or measures of morbidity.
Disease-discordant twins, particularly MZ twins, are excellent subjects for matched case-control studies in which confounding effects of age, sex, genetic background, intrauterine and early environments are perfectly controlled. Therefore, analyzed differences in outcome against differences in exposure, within- and between-pair models, and conditional logistic regression can be used for potential causal inference. Thus, both disease-discordant and disease-concordant twin-pairs are invaluable resources.
At the CNTR, our research mainly focuses on cardiometabolic diseases such as diabetes, obesity, hypertension and genetic diseases among children or adolescent twins; and cardiometabolic diseases, chronic bronchitis/emphysema and cancers in adult twins. At the CNTR, obesity is self-reported after diagnosis by county-level hospital or above for twins younger than 18 years; for twins ≥18 years, obesity is BMI ≥ 28 kg/m2 based on self-reported height and weight. All other diseases are self-reported using a questionnaire. The top three discordant diseases are obesity, hypertension and diabetes. The distribution of disease or lifestyle-discordant and -concordant twin-pairs is listed in Table 3.
Note: Y = disease present; N = disease absent. Obesity: For twins <18 years, obesity is self-reported after diagnosis by county-level hospital or above; for twins ≥18 years, obesity is BMI ≥ 28 kg/m2 based on self-reported height and weight.
1 Percentage of discordant twin-pairs among whole population.
2 Y = current smoker/drinker; N = never smoker/drinker or ex-smoker/drinker. A current smoker is defined as anyone who, self-reportedly, smokes one or more cigarettes (or cigars, pipes or any other smoked tobacco products) daily in the past year. The definition of a current drinker is anyone who self-reportedly consumes >50 g of liquor with 52% alcohol by volume daily in the past year.
3 Y = those who eat at least three servings of vegetables and two servings of fruit per day.
4 Y = those who do moderate or vigorous physical activity for at least 30 min at a time on at least 5 days per week. Moderate activities refer to activities that take moderate physical effort and make breathing somewhat harder than normal. Vigorous physical activities refer to activities that require hard physical effort and make breathing much harder than normal.
Advances of the Twin Cohort
Both genetic and environmental factors contribute to cardiometabolic health. The CNTR comprehensively examined the genetic and environmental effects on variances in weight, height and body mass index (BMI) under 18 years. We found that heritability for weight, height and BMI was low at 0–2 years old (less than 20% for both sexes) but increased over time. Therefore, genetics appear to play an increasingly important role in explaining the variation in weight, height and BMI from early childhood to late adolescence, particularly in boys. Common environmental factors exert their strongest and most independent influence specifically in the pre-adolescent period and more significantly in girls (Liu et al., Reference Liu, Yu, Gao, Cao, Lyu, Wang and Li2015). Our findings emphasize the need to target family and social environmental interventions in early childhood years, especially for females (Liao et al., Reference Liao, Gao, Cao, Lv, Yu, Wang and Li2018). We further quantified and compared the associations of various body composition measurements with serum metabolites and to what degree genetic or environmental factors affect obesity-metabolite relation. Adiposity showed significant associations with serum metabolite concentrations. Of these phenotypic correlations, 64–81% was attributed to genetic factors, whereas 19–36% was attributed to unique environmental factors. To a large degree, shared genetic factors contributed to these associations, with the remainder explained by twin-specific environmental factors (Liao et al., Reference Liao, Gao, Cao, Lv, Yu, Wang and Li2015). Interestingly, a structural equation model adjusting for age and sex found vigorous exercise significantly moderated the additive genetic effects and shared environmental effects on BMI. The genetic contributions to BMI were significantly lower for people who adopted a physically active lifestyle (h 2 = 40%) than those who were relative sedentary (h 2 = 59%). The observed gene–physical activity interaction was more pronounced in men than in women. Therefore, our finding suggests that adopting a physically active lifestyle may help to reduce the genetic influence on BMI among the Chinese population (Wang et al., Reference Wang, Gao, Lv, Yu, Wang, Pang and Li2016).
Future Plans and Systems Epidemiology
Remarkable advances in omics technologies, including genomics, metabolomics and proteomics, and the study of the human microbiome have provided many new opportunities for epidemiologic research. The integration of such omic technologies into epidemiology by adopting a ‘systems epidemiology’ approach can help to understand the etiology of chronic diseases, discover novel biomarkers and identify high-risk populations to target for precision intervention.
In the next step, more fasting blood samples will be collected in the disease/exposure-discordant twins, who will undergo a detailed physical examination. Matched case-control and cohort studies will be conducted in these discordant twins. Omic data, including genetics, genomics, metabolomics and proteomics, and gut microbiome will be tested. The integration of omics and digital technologies in public health will advance our understanding of precision public health. Therefore, a research agenda that incorporates a multidisciplinary approach applied across the life cycle, which leverages new technologies and which addresses new challenges should be most effective for the prevention of chronic diseases.
Aims of the Twin Cohort
The CNTR aims to investigate the genetic and environmental contributions to complex diseases, with particular emphasis on cardiovascular diseases. During the past 20 years of growth, however, the CNTR has not only provided important insights into cardiovascular diseases but also served as a valuable resource for a broad range of study areas. The CNTR has collected data on health, lifestyle and behavior, as well as fasting blood samples. Longitudinal follow-ups and surveillance of major chronic diseases are also planned. Therefore, our research will extend to examine the causal relationship between environmental risk factors and chronic diseases using the disease-discordant twins and exposure-discordant twins.
With the development of diverse high-throughput technologies, the rapidly evolving field of omic data offers the potential to study health and disease in breadth and depth at the human population level. Therefore, we also plan to use systems epidemiology as a novel approach to study the complexities of human pathophysiology or identify new trans-omic biomarkers by integrating genetics, gut microbiota, epigenetics, genomics, metabolomics and population phenotype data.
The CNTR is supported by the special fund for health scientific research in public welfare, China (201002007, 201502006), Key Project of Chinese Ministry of Education (310006), National Natural Science Foundation of China (81573223, 81473041, 81202264, 81711530051) and China Medical Board (01-746). We gratefully acknowledge support from the Centers of Disease Control and Prevention in Qingdao, Dezhou, Zhejiang, Jiangsu, Sichuan, Beijing, Shanghai, Tianjin, Qinghai, Heilongjiang agricultural area, Handan, and Yunnan, and School of Public Health, Harbin Medical University.