We recently reported results of the first longitudinal twin study of reading difficulties (RD) and attention deficit-hyperactivity disorder (ADHD) symptom dimensions in the Colorado Learning Disabilities Research Center (CLDRC; DeFries et al., Reference DeFries, Filipek, Fulker, Olson, Pennington, Smith and Wise1997), from a sample of twin pairs selected for RD (Wadsworth et al., Reference Wadsworth, DeFries, Willcutt, Pennington and Olson2015). The purpose of that study was to assess the etiology of the stability of RD as well as the etiology of comorbidity between RD and ADHD symptom dimensions both contemporaneously and longitudinally, using univariate and bivariate DeFries–Fulker (DF) analyses (DeFries & Fulker, Reference DeFries and Fulker1985; Reference DeFries and Fulker1988). Reading composite data based on the Reading Recognition, Reading Comprehension, and Spelling subtests of the Peabody Individual Achievement Test (PIAT; Dunn & Markwardt, Reference Dunn and Markwardt1970) and ADHD symptom dimensions (inattention [IN] and hyperactivity/impulsivity [H/I]) from the Disruptive Behavior Rating Scale (DBRS; Barkley & Murphy, Reference Barkley and Murphy1998) were analyzed from twin pairs in which at least one member met proband criteria for RD at initial assessment, and in which both members of the pair had data from a follow-up assessment approximately 5 years later. In order for an individual to be classified as RD, he or she was required to have a positive history for reading problems and be classified as affected by scores on the reading composite. Additional diagnostic criteria included a verbal or performance IQ score of at least 85 on the Wechsler Intelligence Scale for Children — Revised (WISC-R; Wechsler, Reference Wechsler1974) or the Wechsler Adult Intelligence Scale — Revised (WAIS-R; Wechsler, Reference Wechsler1981); no evidence of neurological problems; and no uncorrected visual or auditory acuity deficits. The subjects ranged in age from 7.7 to 20.5 years (average age of 11.6 years) at initial assessment, and from 12.6 to 26.6 years (average age of 16.2 years) at follow-up.
The genetic etiologies of RD and of the comorbidity between RD and ADHD at the initial measurement occasion were assessed by DF analyses of data from 767 twin pairs for the univariate analysis of RD and 345 pairs for the bivariate analyses of RD and ADHD. In addition, data were analyzed from 94 twin pairs in which at least one member of each pair met proband criteria for RD and for whom reading data were available at both measurement occasions, as well as from 88 twin pairs that also had ADHD data at follow-up. Results of these analyses indicated that more than 60% of the proband deficit in reading at initial assessment was due to genetic influences, and that reading deficits at follow-up were substantially due to these same genetic influences (Biv h 2 g = 0.79 ± 0.22). Results of bivariate DF analyses of initial reading and both initial and follow-up symptoms of IN indicated that genetic influences accounted for 60% of the contemporaneous relationship and approximately two-thirds of the longitudinal relationship (Biv h 2 g = -0.68 ± 0.33). In contrast, bivariate h 2 g estimates for the comorbidity between initial reading and both contemporaneous and follow-up H/I symptoms were small and non-significant (Wadsworth et al., Reference Wadsworth, DeFries, Willcutt, Pennington and Olson2015). In summary, our previous findings based on analyses of data from the CLDRC selected sample indicated strong genetic influences on RD at initial assessment, as well as on comorbidity, between RD and IN at initial and follow-up assessments (see Table 1).
aAll p values are one-tailed. DF = DeFries–Fulker.
Recently, it has been noted that many statistically significant findings in the behavioral sciences have not replicated (Pashler & Wagenmakers, Reference Pashler and Wagenmakers2012; Plomin et al., Reference Plomin, DeFries, Knopik and Neiderhiser2016). A recent attempt to replicate findings of 100 such studies found that 64% failed to replicate (Open Science Collaboration, 2015). In an attempt to replicate 17 brain-behavior studies, Boekel et al. (Reference Boekel, Wagenmakers, Belay, Verhagen, Brown and Forstmann2015) found that none replicated. Results of attempts to replicate medical findings have been similarly discouraging, with five of six non-randomized designs failing to replicate (Ioannidis, Reference Ioannidis2005). These and similar results have led to claims that 85% of research resources are wasted (Macleod et al., Reference Macleod, Michie, Roberts, Dirnagl, Chalmers, Ioannidis and Glasziou2014).
The International Longitudinal Twin Study of Early Reading Development (ILTS; Byrne et al., Reference Byrne, Olson, Samuelsson, Wadsworth, Corley, DeFries and Willcutt2006) and its continuation into high school provide an exceptional opportunity to conduct a replication of our previous study using similar measures and selection criteria, and exactly the same analyses. To accomplish this, we chose those measurement occasions (post-4th grade and post-9th grade) that corresponded most closely in age to the mean ages at initial and follow-up assessments in the CLDRC (11.6 and 16.2 years, respectively) and selected those twin pairs in which at least one member of the pair had RD at post-4th-grade assessment (average age 10.5 years) and follow-up data at post-9th grade (average age 15.5 years).
Subjects in the current study are participants in the ongoing ILTS (Byrne et al., Reference Byrne, Olson, Samuelsson, Wadsworth, Corley, DeFries and Willcutt2006) that includes twins from Australia, the United States, and Scandinavia. However, the subset of twins whose data were used in the current study include only those participating in the U.S. (Colorado) study. Twins were recruited from birth records, and zygosity was determined from DNA extracted from cheek swabs, or in a minority of cases (28%, most of whom were clearly fraternal) from selected items from the Nichols and Bilbro (Reference Nichols and Bilbro1966) questionnaire. All twins were learning to read English at entrance into the study. Those twin pairs in which at least one member of the pair had a composite reading score at least one standard deviation below the full sample mean at post-4th grade, and scores on either the WPPSI Vocabulary or Block Design at entrance of no more than one standard deviation below the sample mean at entry into the study were selected for analyses. The subsample selected for RD at the end of 4th grade consisted of 86 twin pairs, 38 monozygotic (MZ; i.e., identical), and 48 same-sex dizygotic (DZ; i.e., fraternal). By post-9th grade, the sample consisted of 34 MZ and 46 DZ pairs.
Procedure and Measures
The measures included in the present analyses are from larger test batteries that were administered in the ILTS in the summer after each school year. Testing at each time point was conducted in a single session in the twins’ homes or schools. Two testers separately assessed each twin at the same time. The following measures were included in the current analyses.
The Test of Word Reading Efficiency (TOWRE; Torgesen et al., Reference Torgesen, Wagner and Rashotte1999), Sight Word Efficiency, as well as the Woodcock-Johnson Word ID and Passage Comprehension (Woodcock et al., Reference Woodcock, McGrew and Mather2001) were administered at both post-4th grade and post-9th grade.
IN and H/I were measured using nine items relating to IN and nine relating to H/I from the parent and teacher versions of the DBRS (Barkley & Murphy, Reference Barkley and Murphy1998). These items have been shown to be a valid and reliable measure of ADHD symptoms in children (Lahey et al., Reference Lahey, Pelham, Loney, Kipp, Ehrhardt, Lee and Massetti2004; Willcutt et al., Reference Willcutt, Betjemann, Pennington, Olson, DeFries and Wadsworth2007).
Verbal and performance IQ
WPPSI Vocabulary, assessed at entry into the study at pre-Kindergarten, was used as a proxy for verbal IQ, and Block Design was used as a proxy for performance IQ.
Multiple regression analysis of twin data
Although qualitative analysis such as a comparison of concordance rates is appropriate as a test for genetic etiology of a dichotomous variable, such as diagnosis of an illness or behavioral disorder, RD and ADHD symptoms occur on a continuum, with somewhat arbitrary cut-off points designating an individual as ‘affected’ or ‘unaffected’. Therefore, DeFries and Fulker (Reference DeFries and Fulker1985) proposed a multiple regression analysis of twin data to assess the etiology of extreme scores on a continuous measure. A basic model was proposed in which a co-twin's score is predicted from the proband's score on the selected trait and the coefficient of relationship (1.0 and 0.5 for identical and fraternal twin pairs, respectively) such that
where C symbolizes the co-twin's score, P is the proband's score, R is the coefficient of relationship, and A is the regression constant. B 1 is the partial regression of the co-twin's score on the proband's score, a measure of average MZ and DZ twin resemblance, B 2 is the partial regression of the co-twin's score on the coefficient of relationship and equals twice the difference between the MZ and DZ co-twin means after covariance adjustment for any difference between MZ and DZ proband means. As a result, B 2 provides a direct test for genetic etiology. Further, when the data are appropriately transformed prior to multiple-regression analysis (i.e., each score is expressed as a deviation from the mean of the unselected population and then divided by the difference between the proband and population means), B 2= h 2 g, an index of the extent to which the average deficit of the probands is due to genetic influences (DeFries & Fulker, Reference DeFries and Fulker1988). For the current analyses, the unselected population is represented by the full population sample of twin pairs at each assessment.
Etiologies of stability and comorbidity
The DF multiple regression model may be extended to assess the relationship between two different phenotypes or the same phenotype at two different time points. For example, to assess the etiology of stability between deficits in reading performance at the two time points, the following bivariate extension of the basic regression model was fitted to proband reading scores at initial assessment and co-twins’ scores at follow-up:
where Cy is the co-twin's score at follow-up (Y) and Px is the proband's score at initial assessment. In the bivariate case, B 1 is the partial regression of the co-twin's reading score at follow-up (Y) on the proband's initial reading score (X), a measure of the average MZ–DZ cross-variable twin resemblance, or the extent to which co-twin scores on Y are related to proband scores on X (in this case, reading) across zygosity. B 2 is the partial regression of the co-twin's Y score on the coefficient of relationship. When the data are appropriately transformed, B 2 = hx hy rG(xy), an index of the extent to which the proband deficit on X is due to genetic factors that also influence scores on Y, that is, ‘bivariate heritability’ (Light & DeFries, Reference Light and DeFries1995). rG(xy) is the genetic correlation, an index of the degree to which individual differences in two variables are due to the same genetic influences. Thus, Equation (2) can also be applied to assess the genetic etiologies of both contemporaneous and longitudinal comorbidities between RD and ADHD symptom dimensions.
In the current study, the etiology of reading deficits at 4th grade was assessed, as well as their longitudinal stability between 4th grade and 9th grade. In addition, both the contemporaneous relations between 4th-grade reading and 4th-grade IN and H/I, and the longitudinal relations between 4th-grade reading and 9th-grade IN and H/I were assessed. In order to provide strictly parallel analyses to the CLDRC analyses, subjects were not reselected at 9th grade.
Table 1 presents results of both the previously published analyses of data from the CLDRC and those of the current study. Although there are some relatively minor differences between the results, the overall pattern of results and indeed most estimates are highly similar. In both studies, the heritability of the group deficit in reading at initial assessment is greater than 60%. Also, in both studies, genetic influences on stability of the reading deficit are greater than 70%. Although the bivariate heritability for initial reading and IN in the CLDRC (-0.60) is larger than the corresponding estimate for the ILTS (-0.40), and the difference is even greater for the bivariate heritability of initial reading and follow-up IN (-0.68 vs. -0.33), their confidence intervals overlap substantially. In addition, bivariate heritabilities for initial reading and both initial and follow-up H/I are somewhat lower than the corresponding estimates for IN in both studies.
The failure of many findings in the behavioral and biomedical sciences to replicate may have many possible causes, including differences in populations, ages of subjects, measures, diagnostic criteria, and so forth. Thus, the current study, based on analyses of data from a selected subset of a population sample, has attempted to replicate our previous findings from a selected sample using identical analyses, as well as highly similar measures and diagnostic criteria. Sample sizes differed depending on the measures analyzed and samples from which subjects were drawn, but were similar for the bivariate analyses in the two studies. Results obtained from DF analyses indicated that reading deficits at initial assessment and their stability are due substantially to genetic influences in both studies. Also, results of both studies suggested that genetic influences on the comorbidity between initial reading and IN were greater than those on the comorbidity between initial reading and H/I, both contemporaneously and longitudinally.
As indicated by their relatively large confidence intervals, the differences between the CLDRC and ILTS bivariate heritabilities for initial reading and IN may only be due to chance. However, these differences could also be due in part to some minor differences in sample and procedure. First, the CLDRC sample is a selected sample. Although a subset of subjects was selected for these analyses in the ILTS, the selection criteria were not exactly the same, and indeed could not be the same due to differences in measures administered. Second, although the mean ages of subjects at each measurement occasion were similar, there was a wide range of ages at both measurement occasions in the CLDRC, with the range of ages at initial assessment from 7.7 to 20.5 years of age, and at follow-up from 12.6 to 26.6 years of age, whereas in the ILTS, all subjects were post-4th grade and post-9th grade, with little range in age at each assessment. Further, the measures of reading also differed somewhat for the two samples.
Our previous findings of substantial genetic influences for reading deficits and their longitudinal stability are clearly replicated in this independent analysis of twin data. Although the bivariate h 2 g estimates between RD and IN are somewhat lower in this replication study, the bivariate heritability estimates between reading deficits and H/I are relatively low in both studies. Nevertheless, the minor differences between these results clearly illustrate the need for standardization of procedures and measures, as well as for the importance of replication.
The continued cooperation of the many families and schools participating in the CLCRC and ILTS, as well as the work of the staff members of these projects, is gratefully acknowledged. The Colorado Learning Disabilities Research Center is supported by grant HD027802; the U.S. Sample of the International Longitudinal Twin Study of Early Reading Development is supported by grant HD038526, and its continuation into high school, Etiology and Neuropsychology of Math, Reading, ADHD, and Their Covariation by grant HD068728, all from the Eunice Kennedy Shriver Center of the National Institute of Child Health and Human Development (NICHD).
Conflict of Interest
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.