The common wisdom concerning second language (L2) learning is “the younger, the better”, with child L2 learners being expected to quickly and successfully catch up to their native-speaker peers. Research on age of acquisition onset (AoA) largely supports the common wisdom when it comes to comparing the L2 proficiency of individuals who began to learn a L2 in adulthood vs. childhood (e.g. DeKeyser, Reference DeKeyser, Gass and Mackey2012). However, less is known about whether learning a L2 at different AoAs within the childhood years makes a difference in how long it takes for children to become identical to native speakers of that language, and if all of them do so. Contrary to what is commonly expected, some studies have shown that adults who began to learn the societal language as a L2 in childhood do not always possess L2 grammar and pronunciation equivalent to native speakers (e.g. Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009). Also contrary to the maxim of “the younger, the better”, early AoA is no guarantee of developing and maintaining native-like proficiency in a language, since heritage language speakers can experience attrition or incomplete acquisition (Montrul, Reference Montrul2008). Moreover, researchers have found that language input and experiential factors, first language (L1) background, and language-learning aptitude also shape children's L2 development, possibly more so than AoA (e.g. Jia & Fuse, Reference Jia and Fuse2007; Paradis, Reference Paradis2011; Unsworth, Argyri, Cornips, Hulk, Sorace & Tsimpli, Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014). Longitudinal long-term outcome studies have rarely been conducted with child L2 learners, and the impact of non-AoA factors on child L2 learners has mainly been studied during the early stages of L2 acquisition. Accordingly, this study sought to determine if English L2 children, all of whom had AoAs in early childhood, would reach native-speaker levels of accuracy with English verb morphology after long-term exposure to English as a L2 in English-medium schools in an English majority-language city. Our secondary goal was to determine if non-AoA individual difference factors would predict variation in children's L2 morphological abilities at this late stage in L2 acquisition, and in so doing, understand if these factors influence whether or not L2 children converge on native-speaker accuracy with verb morphology in the elementary school years.
Age effects in L2 acquisition
Lenneberg (Reference Lenneberg1967) proposed that maturational, neurological changes around adolescence cause the offset of a critical period for language acquisition such that a language is rarely learned to native-speaker proficiency after this age. Since Lenneberg's proposal, there has been a great deal of research and debate about how AoA impacts L2 acquisition. Researchers currently debate whether AoA effects in L2 attainment are caused by maturational (internal) or external factors, whether AoA impacts grammatical subdomains differentially, and whether there is a specific ‘cut-off’ age or whether AoA effects are continuous across the lifespan (for reviews, see DeKeyser, Reference DeKeyser, Gass and Mackey2012; Muñoz & Singleton, Reference Muñoz and Singleton2011; Tsimpli, Reference Tsimpli2014). The research most relevant to the present study concerns the relationship between AoAs in childhood and long-term L2 outcomes.
Several studies with adults have found that non-native ultimate attainment can occur in individuals who began to learn a L2 in early childhood (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009; Flege, Munro & MacKay, Reference Flege, Munro and MacKay1995; Flege, Yeni-Komishan & Liu, Reference Flege, Yeni-Komshian and Liu1999; Hakuta, Bialystok & Wiley, Reference Hakuta, Bialystok and Wiley2003; McDonald, Reference McDonald2000; Weber-Fox & Neville, Reference Weber-Fox, Neville and Birdsong1999). First, Abrahamsson and Hyltenstam (Reference Abrahamsson and Hyltenstam2009) examined both the grammatical and pronunciation abilities of 195 Spanish first language (L1)–Swedish L2 speakers who had been living in Sweden for at least ten years, but had started to learn Swedish at different ages. They found differences in grammar and pronunciation between Swedish native speakers and Swedish L2 learners even for those L2 learners who began to learn Swedish at age 5;0 or younger, although discrepancies with native-speaker abilities increased along with AoA. Similar to Abrahamsson and Hyltenstam (Reference Abrahamsson and Hyltenstam2009), McDonald (Reference McDonald2000) found non-native attainment in grammatical knowledge for Vietnamese L1–English L2 adults, including some who began to learn English at or before age 5;0. Regarding pronunciation, Flege et al. (Reference Flege, Munro and MacKay1995) and Flege et al. (Reference Flege, Yeni-Komshian and Liu1999) showed that perceivable foreign accents increased continuously with increasing AoA in English L2 speakers with long-term residence in North America, but importantly, perceivable foreign accents were found in some individuals with AoAs < 5;0. However, Flege et al. (Reference Flege, Yeni-Komshian and Liu1999) found discrepancies with native speakers in English L2 grammatical abilities only for individuals with AoAs > 5;0. Hakuta et al. (Reference Hakuta, Bialystok and Wiley2003) examined self-reported general proficiency in English in US census data from millions of respondents whose L1s were either Spanish or Chinese. The graphical data and analyses show that the respondents' English proficiency was indisputably related to the age when they began to learn English, and this decline with age began in the early childhood period. Finally, Weber-Fox and Neville (Reference Weber-Fox, Neville and Birdsong1999) report a series of studies with Chinese-L1–English-L2 adults where differences between native speakers and L2 speakers were found in grammatical test scores and neuro-processing as a function of increasing AoA, beginning with AoAs of 4;0–6;0. Taken together, this group of studies demonstrates that AoAs in early childhood do not necessarily predict uniform convergence with native-speaker outcomes in L2 grammar and pronunciation. Instead, they suggest that the likelihood of native-like attainment decreases gradually and continuously along with increasing AoA.
Most studies of AoA in L2 acquisition have a developmental retrospective design – that is, they include adult participants with various AoAs – and just a few prospective studies on AoA with L2 children have been conducted to date (Armon-Lotem, Walters & Gagarina, Reference Armon-Lotem, Walters and Gagarina2011; Jia & Fuse, Reference Jia and Fuse2007; Meisel, Reference Meisel, Gavruseva and Haznedar2008, Reference Meisel2009; Unsworth, Reference Unsworth2013; Unsworth et al., Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014). Meisel (Reference Meisel, Gavruseva and Haznedar2008, Reference Meisel2009) observed non-native or L2 acquisition patterns in German-L1–French-L2 children with AoAs between 3;0 and 4;0, and proposed that this age range might mark the end of an early sensitive period for L2 morphological acquisition; however, this proposal was based on age-related differences in error patterns in the early stages of French L2 acquisition, not in long-term attainment in French. Unsworth (Reference Unsworth2013) and Unsworth et al. (Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014) did not find that different AoAs influenced bilingual children's abilities with grammatical gender in Dutch and Greek as L2s, except some differences emerged between simultaneous-from-birth bilinguals and early L2 learners in Greek. In contrast, Armon-Lotem et al. (Reference Armon-Lotem, Walters and Gagarina2011) found negative correlations between AoA and L2 grammatical abilities in German and Hebrew by Russian L1 children. However, these studies by Unsworth, Armon-Lotem, and their colleagues included some children with low exposure to the L2, and thus were not exclusively examining AoA effects in late-stage L2 acquisition. Regarding longitudinal research, Jia and Fuse (Reference Jia and Fuse2007) found that Chinese-L1 children with AoAs in early childhood had higher levels of accuracy with English L2 verb morphology in spontaneous speech over a five-year period than those with AoAs in late childhood/adolescence. However, the early AoA advantage only emerged for two of six grammatical constructions examined, and the small sample size (N = 10) and heterogeneous AoA spread (5;0–16;0) complicate the conclusions that can be drawn from this study. In sum, methodological issues and conflicting findings indicate more prospective developmental research with child L2 learners is needed to determine when, and under what conditions, they diverge from, or converge with, native speakers in their grammatical abilities. While the studies with adults cited above indicate that divergence can be the long-term outcome for some early AoA L2 speakers, we do not know at what point in development such divergence emerges.
Non-age factors influencing child L2 acquisition
Marinova-Todd, Marshall, and Snow (Reference Marinova-Todd, Marshall and Snow2000) argue that non-age factors can shape L2 development and outcomes as much or more than AoA. The developmental retrospective studies discussed in the previous section indicate that non-native outcomes for L2 speakers are probabilistic in that there is variation among early L2 learners in their ultimate attainment. This raises the issue of what non-AoA factors contribute to determining L2 acquisition outcomes. Sources of individual differences in L2 abilities can be either environmental, i.e. external to the child, or they can reflect inherent linguistic and cognitive abilities internal to the child. Regarding external factors, individual differences in input quantity, i.e. amount or length of L2 exposure, have been found to predict differences in children's L2 grammatical abilities (Armon-Lotem, Joffe, Abutbul-Oz, Altman & Walters, Reference Armon-Lotem, Joffe, Abutbul-Oz, Altman, Walters, Grüter and Paradis2014; Armon-Lotem et al., Reference Armon-Lotem, Walters and Gagarina2011; Blom & Paradis, Reference Blom and Paradis2015; Blom, Paradis & Sorenson Duncan, Reference Blom, Paradis and Sorenson Duncan2012; Bohman, Bedore, Peña, Mendez-Perez & Gillam, Reference Bohman, Bedore, Peña, Mendez-Perez and Gillam2010; Chondrogianni & Marinis, Reference Chondrogianni and Marinis2011; Hoff, Welsh, Place & Ribot, Reference Hoff, Welsh, Place, Ribot, Grüter and Paradis2014; Marinis & Chrondrogianni, Reference Marinis and Chondrogianni2010; Paradis, Reference Paradis2011; Unsworth, Reference Unsworth2013; Unsworth et al., Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014). Quality of linguistic input and experience also play a role in child L2 acquisition. Higher family socioeconomic status, greater richness of the L2 environment (e.g. frequency and diversity of reading, media use, organized activities, and playing with friends in the L2), greater parental fluency in the L2, and having older siblings in school are associated with stronger L2 grammatical abilities (Armon-Lotem et al., Reference Armon-Lotem, Walters and Gagarina2011; Bohman et al., Reference Bohman, Bedore, Peña, Mendez-Perez and Gillam2010; Chrondrogianni & Marinis, Reference Chondrogianni and Marinis2011; V. M. Gathercole, Reference Gathercole2007; Hoff et al., Reference Hoff, Welsh, Place, Ribot, Grüter and Paradis2014; Jia & Aaronson, Reference Jia and Aaronson2003; Jia & Fuse, Reference Jia and Fuse2007; Paradis, Reference Paradis2011). Importantly, Jia and Fuse (Reference Jia and Fuse2007), Unsworth (Reference Unsworth2013), and Unsworth et al. (Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014) found that input and experiential factors explained more variance in children's L2 grammatical abilities than AoA in their studies. Similarly, in their study with adult L2 speakers, Flege et al. (Reference Flege, Yeni-Komshian and Liu1999) found that years of education in the United States more strongly predicted English grammatical outcomes than AoA.
In addition to child-external input factors, child-internal factors also influence rate of L2 acquisition. First, research has indicated that children with Chinese L1s (Mandarin and Cantonese) are slower to acquire English L2 verb morphology within the first three years of exposure than children with other L1 backgrounds (Blom & Paradis, Reference Blom and Paradis2013, 2015; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Paradis, Reference Paradis2011). These researchers argued that the more protracted English L2 development of Chinese L1 speakers is likely because their languages lack grammatical tense and subject–verb agreement marking and are typologically isolating; therefore, these L2 learners are not experiencing positive transfer from the L1 to the L2. McDonald's (Reference McDonald2000) retrospective developmental study also found an L1 effect because Spanish-L1–English-L2 speakers outperformed Vietnamese-L1–English-L2 speakers in their knowledge of English verb inflection, even when both groups' AoAs were in early childhood; furthermore, Spanish-L1–English-L2 speakers performed more like native speakers. A second child-internal factor predicting L2 acquisition is verbal memory skills, a component of language learning aptitude (Dörnyei & Skehan, Reference Dörnyei, Skehan, Doughty and Long2003). Researchers have found verbal memory skills to be predictive of L2 outcomes in instructed/foreign L2 learners (Harley & Hart, Reference Harley and Hart1997; Masoura & S. Gathercole, Reference Masoura and Gathercole1999), and correlated with monolingual children's ability to detect errors with verb morphology (McDonald, Reference McDonald2008). Paradis (Reference Paradis2011) found that English L2 children's verbal short-term memory was the strongest predictor of individual differences in their accuracy with English verb morphology in the first three years of learning English, even stronger than length of exposure to the L2. A third child-internal factor associated with L2 grammatical acquisition is vocabulary size. Research with both simultaneous bilingual and L2 children has revealed that size of vocabulary in a language is associated with grammatical development in the same language (Conboy & Thal, Reference Conboy and Thal2006; Marchman, Martínez-Sussmann & Dale, Reference Marchman, Martínez-Sussmann and Dale2004; Simon-Cereijido & Gutiérrez-Clellen, Reference Simon-Cereijido and Gutiérrez-Clellen2009). More specifically, there is an association between L2 vocabulary size and accuracy with L2 verb inflections such as third singular [-s] and past tense (Blom & Paradis, Reference Blom and Paradis2013; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Marinis & Chondrogianni, Reference Marinis and Chondrogianni2010).
In addition to child-level factors, language-level factors can also influence children's accuracy with verb morphology. Language-level factors refer to frequency and distributional properties of the input that all speakers/hearers would be exposed to, and thus are not sources of individual differences at the child level. For the present study, we considered language-level factors pertaining to the inflectional morphemes third singular [-s], regular past [-ed], and past irregular, such as the frequency of an inflected word (verb stem + affix or irregular past, dug or ran) in the input and allomorph type (third singular -s: [ѕ], [z], [ɪz]; past regular: [t], [d], [ɪd]). Research has shown that the acquisition of English L2 grammatical morphology is sensitive to word frequency and allomorph type (Blom & Paradis, Reference Blom and Paradis2013; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001; Marinis & Chrondrogianni, Reference Marinis and Chondrogianni2010). Regarding frequency of inflected words, this means that L2 learners would be more accurate in using third singular [-s] or the past tense with a verb that appears more frequently in this inflectional form in the input than with another verb that appears less frequently. Regarding allomorph types, these are phonologically conditioned by the verb stem, but are also unevenly distributed in the input since there are more verb stems that take the voiced obstruent [z] or [d] than take the VC allomorphs [ɪz] or [ɪd]. For example, verb stems ending in either voiced consonants or vowels take [z], whereas, stems ending in sibilants take [ɪz], and the former comprises a larger set of verb stems in English (Blom & Paradis, Reference Blom and Paradis2013; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012). It is difficult to disentangle whether phonological or type frequency factors contribute to the later acquisition of the VC allomorphs (cf. Blom & Paradis, Reference Blom and Paradis2013), and so for the present study we did not explore this issue.
To date, most of the research examining the impact of non-age age factors in child L2 has either focused on early stages of L2 acquisition or has included both early- and late-stage L2 children in the study sample. Consequently, it is not well known to what extent these individual difference (external and internal) and language-level factors continue to influence L2 abilities at later stages of acquisition.
Present study: design and research questions
This study sought to determine if English L2 children with AoAs in early childhood would all reach native-speaker levels of accuracy with English verb morphology in production and with judgements of correct/incorrect use after long-term L2 exposure. Children's accuracy with verb morphology was examined over three years; a longitudinal design was chosen so that the shape of developmental trajectories could be examined. Regarding length of L2 exposure, we based our choice of time frame on previous research indicating that L2 children catch up to native speakers in their oral language abilities after approximately four to six years of exposure in preschool/school (Hakuta Goto Butler & Witt, Reference Hakuta, Goto Butler and Witt2000; Saunders & O'Brien, Reference Saunders, O'Brien, Genesee, Lindholm-Leary, Saunders and Christian2006). More specifically concerning morphology, Jia and Fuse's (Reference Jia and Fuse2007) study of accuracy with English L2 grammatical morphology showed a plateau or asymptote in development after four to five years of exposure for most morphemes. Also, Marinis and Chondrogianni (Reference Marinis and Chondrogianni2010) found that Turkish-L1–English-L2 children were close or equivalent to native-speaker accuracy with verb inflection by six years of exposure. Accordingly, for the present study, children were examined in their fourth, fifth, and sixth year of exposure to English in preschool/school.
The predictive role of environmental factors and the internal factors of verbal short-term memory and vocabulary size on children's long-term outcomes was examined in this study. Only Chinese L1 children were included in the study. This was because of their protracted acquisition of English morphology demonstrated in previous research (Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Jia & Fuse, Reference Jia and Fuse2007; Paradis, Reference Paradis2011). Therefore, L1 background was not manipulated as an individual-difference variable. Regarding language-level factors, the impact of word frequency and allomorph on children's accuracy with verb inflection in production was examined.
In brief, English L2 children with Chinese (Mandarin and Cantonese) L1 backgrounds were given tests of production and grammaticality judgements with verb morphology once a year for three years/rounds. These were standardized tests, normed with monolinguals. Monolinguals reach ceiling on these tests by age 6;0, and their scores remain stable and at ceiling as they get older (Rice & Wexler, Reference Rice and Wexler2001; Rice, Wexler & Hershberger, Reference Rice, Wexler and Hershberger1998). Therefore, assessing how native-like the performance of L2 children with four to six years of exposure to English is on these tests constitutes a fair comparison. Analyses of these longitudinal data focused on addressing the following questions:
1. Is there change across the three rounds in children's scores on the verb morphology tests? If so, does L2 learning appear to be growing or reaching a plateau?
2. Have the L2 children reached native-speaker levels of abilities with verb morphology by the final round?
3. What environmental and child-internal factors influence children's accuracy with L2 verb morphology? What language-level frequency factors influence use of L2 verb inflection?
Child participants were recruited from Cantonese- and Mandarin-speaking families residing in Edmonton, Canada. There are close to 100 non-English languages spoken in Edmonton and 6·8% and 3·9% of the population report speaking Cantonese and Mandarin, respectively, at home (Statistics Canada, 2011). Parents had to be both foreign-born and native speakers of a Chinese language and L2 speakers of English. Children were either Canadian-born or foreign-born, but had to have started sustained and consistent exposure to English in a daycare, preschool, or school programme before age 6;0. While families varied in their use of English at home at the time of testing, as an inclusion criterion, all children had to have been spoken to exclusively or primarily in Chinese by their parents from birth until at least 3;0. Thus, there were no families who had bilingual language use at home starting from the child's birth, but instead the children were raised with primarily Chinese at home and English at daycare/preschool/school in their early years. In general, our sample could be characterized as having a high socioeconomic status background because the majority of the mothers had some post-secondary education. Mean maternal education in years was 14·6 (SD = 3·3). This is not unusual given Canada's point-based immigration system, where higher education levels increase the likelihood of acceptance for immigration.
Participants were chosen to form a cohort with respect to AoA, chronological age, and length of exposure from a larger database of children who had participated in previous studies. First, AoA had to be < 6;0 for inclusion, and in our sample the mean AoA was 4;2 (SD = 1;0, range = 1;7–5;8). For chronological age, we aimed for a mean of approximately 8½ years at Round 1, and recruited children whose age was no more than ±12 months of this mean. For length of exposure at Round 1, we aimed for a mean of approximately 4½ years, and recruited children whose length of exposure was no more than ±6 months of this mean. Children's actual chronological ages and length of exposure to English at each round are presented in Table 1.
notes: Age is chronological age. Length of exposure means years and months of exposure to English in daycare/preschool/school. Proportion of English use in the home, spoken to the child by family members and spoken by the child to family members, is calculated between 0 and 1·0, with 1·0 as only English being used/spoken. Richness of the English environment is calculated between 0 and 1·0, with 1·0 as the richest possible English environment. CTOPP is Wagner et al. (Reference Wagner, Torgesen and Rashotte1999), and the non-word repetition subtest is a measure of verbal short-term memory. These are standard scores, mean = 10, 1 SD range = 7–13. The PPVT-IV is Dunn and Dunn (Reference Dunn and Dunn2007) and measures receptive vocabulary size. These are standard scores, mean = 100, 1 SD range = 85–115.
In the sample, 10/18 children had Cantonese as their L1 and 8 had Mandarin as their L1. Both languages are typologically isolating and do not mark tense grammatically or have subject–verb agreement morphology (Lin, Reference Lin2001; Matthews & Yip, Reference Matthews and Yip1994); furthermore, previous research has found that children from both these Chinese language backgrounds display more protracted acquisition of verb morphology in English than children whose L1s are typologically inflecting and mark tense and agreement grammatically (Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Paradis, Reference Paradis2011). Nevertheless, we examined if there were any consistent between-group differences based on L1 for our dependent variables in the present study. Non-parametric comparisons were used because of small Ns. Mann–Whitney U tests comparing the scores between Cantonese- and Mandarin-speaking children for all TEGI probes (see ‘Materials and procedure’; the past tense probe was divided into regular and irregular scores) at all rounds were conducted, 24 comparisons in total. Results yielded 23/24 non-significant differences and one marginally significant difference, for third singular [-s] at Round 3 (p = ·059). Based on this analysis, we judged that grouping the children together was justified.
Materials and procedure
Children were tested in their homes once a year, and parents were given a questionnaire during the home visits. The tests measured children's abilities with verb morphology, their verbal short-term memory, and their receptive vocabulary size. The parent questionnaire was used to obtain information on a child's quantity and quality of English input. The verb morphology constructions targeted in testing were: third person singular (3rd sing. -s), he walks; past regular, he walked and past irregular, he ran; BE auxiliary and copula, they are walking, he is happy; DO auxiliary, does he walk every day? What follows is a description of the tests and the questionnaire used to obtain our dependent and independent variables.
Test of Early Grammatical Impairment (TEGI). The TEGI (Rice & Wexler, Reference Rice and Wexler2001) was administered to the children, following the instructions in the Examiner's Manual. The TEGI includes production probes for the use of 3rd sing. [-s], past regular [-ed], past irregular, BE and DO (in questions and statements). The TEGI also has grammaticality judgement (GJ) probes for correct use, incorrect use, or omission of verb morphemes.
For the 3rd sing. [-s] probe, children were shown pictures of professionals engaged in work activities and given prompts like Here is a teacher. Tell me what a teacher does. Expected answers included A teacher writes on the board or A teacher teaches. Ten items elicited 3rd sing. [-s ] responses. For the past tense probe, participants were shown pictures of children engaged in activities, followed by a picture showing the activity being completed, and given prompts like Here the boy is raking. Now he is done. Tell me what he did. The expected answer would be The boy/he raked. Ten items elicited regular past tense [-ed] and eight items elicited irregular past tense forms.
In the BE/DO probe, children were prompted to ask questions or make statements using these morphemes. There were thirty-six items in total, twelve eliciting BE copula, thirteen BE auxiliary, and eleven DO. In this task, the child had to direct his or her questions to a puppet about one or more stuffed animals, or make statements about the animals. Thus, third singular and plural questions and statements were elicited using is/are and do/does. For instance, I wonder if the bears are resting. Ask the puppet was expected to prompt Are the bears resting? (BE auxiliary), and I wonder about the Kitty. Ask the puppet if the Kitty is hungry was expected to prompt Is the Kitty hungry? (BE copula).
In the TEGI GJ probe, the experimenter acted out a scenario with toys that includes two robots who children were told are just learning to speak English and do not say everything correctly. During the scenario, the children were asked to determine if the robots' statements were said correctly or incorrectly (right or not so good). There were thirty-five test items in this probe. The TEGI GJ probe examines children's ability to detect correct use or omission of verb morphemes, e.g. Dropped Marker: e.g. he jumps over there/*he jump over there or he is jumping over there/*he jumping over there, correct use or incorrect use of morphemes, e.g. Bad Agreement, e.g. he is jumping/*he am jumping, and omission of the progressive [-ing], e.g. Dropped ING, e.g. he is jumping/*he is jump.
For the production probes, a proportion correct score for the morpheme targeted was calculated by dividing the child's correct responses by the total of scorable responses. Unscorable responses are those that were imitating the experimenter's prompt, off-topic, or included a completely different verb construction, e.g. present progressive on the past tense probe. Because this study included older children with long-term exposure to English, unscorable responses were uncommon. For the past tense probe, the TEGI scoring procedures include accepting over-regularized past tense forms, e.g. digged for dug, as correct. For the purposes of this study, we scored over-regularizations as incorrect when we separated regular from irregular past in the analyses, but used the TEGI procedure when comparing a child's score to the TEGI norms. For the GJ probe, following the Examiner's Manual, children's correct rejections, false alarms, misses, and hits were calculated and transformed into A-prime scores for Dropped Marker, Bad Agreement, and Dropped ING separately (Rice & Wexler, Reference Rice and Wexler2001; Rice, Wexler & Redmond Reference Rice, Wexler and Redmond1999).
The Alberta Language Environment Questionnaire (ALEQ). The ALEQ (Paradis, Reference Paradis2011; <http://www.linguistics.ualberta.ca/CHESL_Centre/Questionnaires.aspx>) was administered to one parent, usually the mother, and with the aid of an interpreter if needed. This instrument includes questions on various topics, including current language use by family members in the home and the richness of the child's English environment. Information on language use in the home was gathered through questions such as What language does the mother speak with the child? or What language does the child speak with the mother? where answers were on 5-point rating scales from 0 [English never/mother tongue always] to 4 [English almost always/mother tongue almost never]. The proportion of overall English use in the home (input and output) was calculated from these rating scales for each family member. Richness of the English environment was determined by calculating the number of English-language activities the child engaged in, i.e. book-looking, TV/computer watching, reciting songs/rhymes, extracurricular activities, playing with English-speaking friends, and the frequency of these activities per week, to yield a proportional score from 0 to 1·0. Other information gathered from the ALEQ was used for inclusion criteria like AoA and length of exposure to English in school, and to gauge socioeconomic background (maternal education). English-use-in-the-home functioned as the variable measuring quantity of English input outside school, and English richness was the variable measuring quality of English input outside school. It is important to point out that quantity and quality cannot be entirely separated, and we did not have research questions comparing quantity versus quality of input, but instead considered both variables to be measures of environmental factors potentially influencing children's L2 abilities. Descriptives for the variables of English-use-at-home and English richness are given for each round in Table 1. Note that the mean use of English among family members ranged from .36 to .40 across the rounds, indicating that Chinese was being spoken more often than English among most families at home.
Comprehensive Test of Phonological Processing (CTOPP). The CTOPP non-word repetition subtest (Wagner, Torgesen & Rashotte, Reference Wagner, Torgesen and Rashotte1999) was administered to the children. This test includes a list of non-words that increase in length in syllables, played to the child from a CD, and the child was asked to repeat each one right after hearing it. The child's responses were recorded for later scoring. Following the CTOPP scoring procedures, children's non-word repetitions were scored as correct (each sound repeated correctly) or incorrect (missing or substituted sounds). Raw scores were converted to standard scores, which are corrected for age, and the descriptives are in Table 1 for each round. For this subtest, the standard mean is 10, with 1 SD range of 7–13. Non-word repetition is a measure of verbal short-term memory (S. Gathercole, Reference Gathercole2006), and constituted one of the child-internal predictor variables in this study.
Peabody Picture Vocabulary Test (PPVT-IV). The PPVT (Dunn & Dunn, Reference Dunn and Dunn2007), a measure of receptive vocabulary size, was also administered to the children. Children were asked to point to an image out of an array of four images that best matched a word spoken by the experimenter. As with non-word repetition, raw scores were converted to standard scores, to correct for age, and descriptives are in Table 1. The PPVT standard mean is 100, with 1 SD range of 85–115.
Language input frequency. For the linear mixed regression analyses, we included language-level predictor variables, word frequency and allomorph type. The frequencies for the individual inflected verbs for the 3rd sing. [-s] and past tense probes were derived from the Edmonton ELL corpus and used in previous studies (Blom & Paradis, Reference Blom and Paradis2013; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012). The Edmonton ELL corpus consists of the spontaneous speech of native English-speaker research assistants and English L2 children (different from those in the present study) recorded and transcribed for other research purposes. The majority of words in the corpus comes from the research assistants. The Edmonton ELL corpus is relatively small (<500,000 words), but it is representative of the speech the children in this study hear because it is based on the oral speech of individuals in the Edmonton area. Moreover, word frequencies in this corpus have proven predictive of children's accuracy with L2 morphology in prior studies, while those from larger, less representative corpora did not (see Blom & Paradis, Reference Blom and Paradis2013, and Blom et al., Reference Blom, Paradis and Sorenson Duncan2012, for more details). Frequencies were log-transformed and entered into the data frame for each item (verb) on the TEGI probe.
Change in scores over time
Figures 1 and 2 present a visual display of the change in scores across the three rounds for the TEGI production and grammaticality judgement probes, respectively. The mean scores with the standard deviations are presented in Table 2. We used linear mixed logistic regression modeling with the lme4 package (Bates, Maechler, Bolker & Walker, Reference Bates, Maechler, Bolker and Walker2013) in the R statistical programming environment (R Core Team, 2013) in order to address research question (1) above concerning whether these children have reached a plateau in their morphological acquisition. Child, item, and round were random factors, with a random intercept for each item, and a random intercept and slope for each child dependent on round. Regarding random items, rather than aggregate proportion correct or A-prime scores for all items in a probe, for the modeling analysis, children's scores were coded as ‘TRUE’ or ‘FALSE’, indicating if the child gave a correct or incorrect answer to each item within each TEGI probe. For these analyses, slopes between Round 1 to 2 were modeled (Model 1), and slopes between Round 2 to 3 were modeled (Model 2). If the fixed factor of Round was non-significant in both models, this would indicate a flat curve, i.e. no growth, across the three rounds. If Round was significant from Round 1 to 2 but non-significant from Round 2 to 3, this would indicate a plateau shape to the curve. Because accuracy was so stable and high for Dropped ING, no model was generated for this probe. The z and p values from these analyses are presented in Table 2 for each probe. Results indicate that for the DO probe, no change was found over time in the children's scores, i.e. flat curve. For the 3rd sing. [-s], past regular, past irregular, GJ Dropped Marker, and GJ Bad Agreement probes, change was found from Round 1 to 2, but not from Round 2 to Round 3, indicating a plateau shape in the curve. For the BE probe, continued change was found because significant change between rounds was found in both models.
notes: Scores are means (standard deviations), proportion correct or A-prime, for R1 (Round 1), R2 (Round 2), and R3 (Round 3). Individual items were entered into models, not proportion correct or A-prime scores. Model results are z-values followed by p-values for the fixed factor, Round. p-values < ·05 are marked with ‘*’, p-values < ·01 with ‘**’, and p-values < ·001 with ‘***’.
Comparisons with monolingual criterion scores
TEGI is a criterion-referenced test. This means that individual scores are compared to a criterion or cut-off score, rather than converted to standard scores or percentiles, to assess whether an individual child's performance meets age expectations. TEGI criterion scores are the lowest possible score obtained by children with typical language development in the norming sample. (The TEGI norming sample included children with specific language impairment.) Each child's score was age-referenced to the appropriate criterion score from the TEGI Examiner's Manual; however, it is important to note that the monolingual typically developing children in the TEGI norming sample showed stable, ceiling performance from about 6;0 until 8;11 on all probes (Rice & Wexler, Reference Rice and Wexler2001). At the final round, the children in this study were older than the oldest group from the TEGI norming sample (8;11), and so we used the last criterion score given. This was not problematic because the monolingual children's ceiling scores had not changed over three years and there is no plausible reason to believe their scores would have gone down after that point. One final explanation regards the past tense probe. As mentioned in ‘Method’, the TEGI scoring procedures assign a correct score for a past irregular form that is over-regularized, e.g. digged instead of dug would be correct because it has morphological marking. This is why past regular and irregular are not separated for this analysis.
Because we were interested in children's long-term outcomes (see research question (2) above), we focused on Round 3 for this comparison with monolinguals. We assigned children a score of 1 (score is at or above criterion score) or 0 (score is below criterion) for each TEGI production and grammaticality judgement probe. Table 3 shows the individual criterion scores for each of the children at Round 3, along with the means and SDs for each probe. Note that for the probe means, 1·0 indicates all children reached criterion; a mean of 0 indicates no child reached criterion. Table 3 shows that 11/18 or 61% of the children had scores below criterion on one or more probes. Note that both Mandarin L1 and Cantonese L1 children are represented among children not meeting criterion scores on the TEGI. Of the 11 children not meeting TEGI criterion scores, none had met the criterion at a previous round for that probe (i.e. no backsliding). Of the 11 children with scores below criterion, 4 had 1 score below, 3 had 2 scores below, 3 had 3 scores below and 1 had 4 scores below. Regarding the probes, the only probe where all children reached criterion was Dropped ING. For BE and Bad Agreement, just 1 child did not reach criterion on these probes, but it was a different child for each probe. For the past tense, 3 children did not meet criterion, for 3rd sing. [-s], 4 children, Dropped marker, 5 children, and DO, 9 children.
notes: ‘MA’ = Mandarin L1; ‘CA’ = Cantonese L1. ‘1’ means the child's score for that probe was at or above the criterion score for their age, based on monolingual norms. ‘0’ means the child's score was below the criterion score.
All the children in this study began to learn English before the age of 6;0, but there was still variation in their AoAs. We next examined whether children who did not meet criterion scores for one or more TEGI probes had older AoAs than the children who did meet the criterion scores. The 11 children who did not meet criterion on one or more TEGI probes began to learn English between ages 1;7 and 5;5 (M = 4;2). The 7 children who did meet the criterion scores on all the TEGI probes began to learn English between ages 3;8–5;8 (M = 4;2). Thus, younger age of English onset did not seem associated with whether children reached native-speaker levels of performance on the TEGI.
The children had a mean exposure of 6;4 to English at Round 3, but, as with AoA, there was some individual variation in length of exposure at each round. We next examined whether children who did not meet criterion scores on the TEGI at Round 3 had less exposure to English than the children who did meet the criterion scores. The 11 children who did not meet criterion had between 5;5–7;4 years of exposure to English (M = 6;5); the 7 children who did meet the criterion had between 5;7–7;1 years of exposure (M = 6;2). Therefore, also as with AoA, differences in exposure time to English among the children at Round 3 did not appear to explain whether or not children's scores met the criterion on the TEGI.
Individual difference and language-level factors
We used linear mixed logistic regression modeling in R to determine which factors most influenced children's performance on each probe across the three rounds. Child, item, and round were random factors, with a random intercept for each item, and a random intercept and slope for each child dependent on round. Fixed factors included child-level (individual difference) and item-level (language-level) factors. Child-level factors were: English-use-at-home (input quantity), English Richness (input quality), CTOPP (non-word repetition – verbal short-term memory), and PPVT (vocabulary size). Item-level factors were word frequency (3rd sing. [-s], past regular and irregular only) and allomorph (3rd sing. [-s] and past regular only).
For the child-level factors, correlations were performed between the values at each round to determine if any were .75 or higher. The highest correlation was .53, observed at Round 3 between CTOPP and PPVT, thus none were high enough for co-linearity to be an issue. The moderate correlation between these factors is not surprising given the well-established relationship between vocabulary size and verbal short-term memory in children (S. Gathercole, 2006). Both factors were entered into the models because, in spite of this relationship, they measure separate constructs, and thus could exert separate influences on children's performance with verb morphology. Moreover, if a correlation in a sample also exists for the entire population, this correlation is not expected to be problematic for a model (Harrell, Reference Harrell2001).
Because there were just 18 children in our sample, the final optimal model for each probe was restricted to two child-level fixed factors, following the convention of approximately one factor to ten participants. There were just two item-level factors, allomorph and word frequency. Because each item was considered individually, the number of items was sufficient such that no restrictions on the number of item-level factors for the probes of verb inflection were required. Because we considered four child-level factors in total in our study (English-use-at-home, English richness, CTOPP, and PPVT), we had a three-step process to determine the optimal model for each TEGI probe. The first step was to generate the best full model for each probe, i.e. the best-fitting model (lowest deviance as measured by AIC) with a maximum of two child-level factors, and any number of the item-level factors. So, the child-level factors were entered systematically, in pairs, to generate several models in order to find the best-fitting one. The second step was to determine if the best-fitting full model for each probe was superior to a reduced model with one child-level or item-level factor removed at a time (the difference in deviance produced a significant chi-square value at 1 degree of freedom: X 2 > 3·84, p < ·05). If so, the full model was chosen; if not, the reduced model was chosen as the optimal model. Step three consisted of calculating the Concordance Index C for the optimal model to assess whether this optimal model was a good fit. C ranges from .50 to 1·0, and models of .80 or higher are considered to be good-fitting models (Chatterji & Hadi, Reference Chatterjee and Hadi2006). As with the analyses of curve shape over time, because accuracy was so stable and high for Dropped ING, no model was produced for this probe.
The summary results of this process are in Table 4. For each TEGI probe, the significant fixed factors and C for the optimal model are listed. For the DO probe, two models are given because they produced nearly equal deviances, and thus a best-fitting model could not be determined between them. For the 3rd sing. [-s] probe, the optimal model of children's performance included PPVT (z = 3·25, p = ·0012) and English richness (z = 3·051, p = ·0023) as child-level predictors, indicating that children with bigger English vocabularies and richer English environments outside school had greater accuracy with this morpheme in production. One item-level predictor, allomorph (allomorph-s: z = 2·17, p = ·0302; allomorph-z: z = 2·89, p = ·0039) was significant, indicating that children were more accurate in adding the 3rd sing. [-s] with verbs taking the allomorphs [-s] and [-z] than the allomorph [-ɪz]. For past regular, children's performance was best predicted by CTOPP (z = 2·92, p = ·0036) and PPVT (z = 2·25, p = ·0247) and allomorph (allomorph-ɪd: –2·84, z = –2·52, p = ·0119). Thus, children with bigger vocabularies and superior verbal short-term memories were more accurate with regular past tense marking. For the allomorph variable, children were less accurate in producing the past tense with verbs taking the allomorph [-ɪd] than with verbs taking [-d] or [-t]. The optimal model for the past irregular forms included both CTOPP (z = 3·40, p < ·0000) and PPVT (z = 2·65, p = ·0079) as predictors. As with past regular, children with bigger vocabularies and superior verbal short-term memories were more accurate with past irregular. In addition, the frequency of the target verb appearing in its irregular past tense form in the input predicted more accurate use of this form by the children (z = 2·56, p = ·0105). For the BE probe, children's responses were best predicted by CTOPP/verbal short-term memory (z = 3·77, p = ·0002) and PPVT/vocabulary size (z = 5·97, p < ·0000). Regarding the DO probe, the first model included CTOPP (z = 2·61, p = ·0092) and English-use-at-home (z = 3·54, p = ·0004) as predictors, and the second model included PPVT (z = 2·45, p = ·0141) and English-use-at-home (z = 2·41, p = ·0160). Thus, children were more accurate with DO forms when they heard/spoke more English at home and if they had bigger vocabularies or superior verbal short-term memories. Turning to the grammaticality judgement probes, for the Dropped Marker probe, children's responses were best predicted by a model including CTOPP/verbal short-term memory (z = 2·84, p = ·0045) and PPVT/vocabulary (z = 2·76, p = ·0058). For Bad Agreement, the most parsimonious optimal model included just PPVT/vocabulary (z = 4·70, p < ·0000).
notes: Child-level factors are PPVT (vocabulary size), English richness (richness of the English environment outside school), CTOPP (verbal short-term memory), and English-use-at-home (proportional use of English among family members). Item-level factors are allomorph (allomorphs required by verbs on TEGI probes), and word frequency (frequency of the inflected word form in the input).
This longitudinal study examined the long-term outcomes with English L2 verb morphology of Chinese L1 children who all began to learn English in early childhood (AoAmean = 4;2). Our research questions were aimed at determining (1) if children's developmental trajectories were slowing down/reaching a plateau, (2) if children had reached native-speaker levels of accuracy with the morphemes, and (3) what individual-difference and language-level factors played a role in shaping their abilities with L2 morphology during this late stage of their acquisition.
Developmental trajectories and native-like attainment
Two hallmark characteristics of English monolingual morphological acquisition are that growth reaches a plateau over time (by age 6;0) and that accuracy reaches ceiling at asymptote, with very little individual variation at that point (Rice & Wexler, Reference Rice and Wexler1996; Rice, Wexler & Hershberger, Reference Rice, Wexler and Hershberger1998). These characteristics are evident in the TEGI norming sample (Rice & Wexler, Reference Rice and Wexler2001). Our analyses examined growth over time with L2 morphology and an interpretation of L2 children's outcomes in terms of their monolingual peers.
Overall, the L2 children's proportion correct and A-prime scores on the TEGI were highly accurate by Round 3, where mean scores for all probes were >·80, and for all probes except past irregular and DO, they were >·90. The analyses of developmental trajectories indicated that for the majority of probes, the children were showing a plateau in their growth in performance by the end of the study, since no changes emerged between slopes for Rounds 2 and 3 except for the BE probe. These results point to the possibility that children's development of L2 morphology could be approaching asymptote or, at least, slowing down. In addition, not all the children reached native-speaker levels of accuracy on all the probes, but instead there was variation in outcomes across children and probes. Regarding individual differences among the children, 61% did not meet native-speaker levels of accuracy with one or more probes after 6;4 years of exposure, but importantly, 39% of the children did converge on native-speaker performance for all the probes. Thus, the likelihood of attaining native-speaker accuracy within the time period of this study varied among individuals in spite of all of them having early AoAs. Differences among the probes emerged with respect to children's convergence on native-speaker abilities, ranging from the Dropped -ING probe where 100% of the children reached native-speaker accuracy levels to the DO probe where just 50% of the children met native-speaker levels of accuracy. These results point to how children seemed to be acquiring these morphological constructions not as a group but rather on a piecemeal basis. Furthermore, a few children did not reach criterion on the Dropped Marker GJ probe, indicating that divergence with native-speaker abilities was not merely a production problem.
A pattern emerged across these results suggesting that verb inflection, particularly inflection marking subject–verb agreement (3rd sing. -s, DO, Dropped Marker), could be exceptionally problematic for Chinese L1 learners of English. First, the superior long-term attainment with the BE probe in production versus the production probes involving inflectional verb morphology, past tense and 3rd sing. [-s], could be expected given that English L2 children's accuracy with BE outpaces their accuracy with verb inflections at earlier stages of acquisition, and Chinese L1 children are slower to acquire verb inflection in L2 English than children from other L1 backgrounds (Blom et al., 2012; Paradis, Reference Paradis, Gavruseva and Haznedar2008; Paradis, Rice, Crago & Marquis, Reference Paradis, Rice, Crago and Marquis2008). Second, the most common error children made on the DO probe was use of the plural ‘DO’ when the third singular ‘DOES’ was required, rather than omission of DO, also signalling a problem with agreement inflection. Furthermore, half of the stimuli on the Dropped Marker probe involved omission of 3rd sing. [-s]. An examination of children's scores for stimuli targets with dropped BE vs. verb inflection on this probe indicate higher scores for the former at Round 3 (t = 2·71, p = ·016). Finally, the three probes involving verb inflection related to agreement, 3rd sing. [-s], DO, and Dropped Marker, were the probes for which the largest number of children did not meet criterion by Round 3. That morphology related to subject–verb agreement would be particularly problematic is consistent with Tsimpli's (Reference Tsimpli2014) proposal that semantically vacuous, narrow syntax is more vulnerable to delayed AoA than other grammatical subdomains.
How do these results compare to other studies of long-term outcomes with English L2 verb morphology? Jia and Fuse (Reference Jia and Fuse2007) found that developmental trajectories of correct use of morphology in spontaneous speech accelerated at first, but reached plateau by the end of five years, paralleling our results. Jia and Fuse found a great deal of variation among children and morphemes, also in parallel with the present study. Regarding native-like attainment, Jia and Fuse used 80% correct use in context as a criterion for ‘mastery’, and found that the early AoA children were more likely to reach mastery for morphology than the older AoA children. If the more conventional criterion of 90% correct use in spontaneous speech (e.g. Brown, Reference Brown1973) were applied to their data, it would appear that some of their early AoA children did not reach this more stringent criterion for mastery with 3rd sing. [s], DO, BE, past regular and irregular by the end of five years of exposure. Marinis and Chrondrogianni (Reference Marinis and Chondrogianni2010) compared Turkish-L1–English-L2 children's performance on standardized tests of English, including the TEGI, to the performance of monolingual age peers. While this was not a study focused on the influence of AoA on child L2 acquisition, they do report some relevant long-term attainment findings on the TEGI, as the L2 children had an average of four years of exposure to English and AoAs < 6;0. Marinis and Chondrogianni found no differences between the L2 and monolingual children on the TEGI past tense probe, but the L2 children performed worse than monolinguals on the 3rd sing. [-s] probe. Their analysis of individual scores indicated that 5/6 nine-year-old children with six years of exposure did meet age-expected criterion scores for this probe. Thus, Marinis and Chrondrogianni also found variation between probes and individuals, but it seems that convergence with native-speaker performance could be expected for Turkish L1 children by six years of exposure.
This difference between Marinis and Chondrogianni (Reference Marinis and Chondrogianni2010) on one hand, and Jia and Fuse (Reference Jia and Fuse2007) and the present study on the other, might be due to Turkish being an inflected language and Cantonese and Mandarin being isolating languages. Speakers of isolating L1s like Cantonese, Mandarin, and Vietnamese show greater difficulty in acquiring verb inflection than speakers of languages with rich inflection, like Spanish or Punjabi, in both the early stages and in long-term outcomes (Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; McDonald, Reference McDonald2000; Paradis, Reference Paradis2011). The particular vulnerability of inflectional morphology in English as an L2 could be a combination of the following: (i) low saliency and cue reliability of morphology in the input; (ii) the filter of L1 phonological constraints on codas and consonant clusters; and (iii) the need to re-focus attention and processing routines in order to acquire grammatical features not present in the L1 (Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Ellis, Reference Ellis2008; Flege et al., Reference Flege, Yeni-Komshian and Liu1999; Sorenson Duncan & Paradis, in press). Even though low saliency and cue reliability would affect all English L2 learners, the phonological and morphological characteristics of Mandarin and Cantonese would render verb inflection more challenging for speakers of these L1s because they cannot benefit from positive L1 transfer. The variability in long-term outcomes with morphology found in the present study points to the possibility of long-lasting effects of L1 influence even in L2 learners with early AoAs. However, because all the children had a Chinese L1 in the present study, this conjecture needs to be tested with further research including children from other L1 backgrounds.
The present study was designed to look at long-term child L2 outcomes, and it is relevant to ask whether these long-term outcomes might signal children's ultimate L2 attainment. Recall that developmental retrospective studies with adults have found that early AoA L2 speakers do not always converge on native-speaker grammatical abilities, including verb morphology (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009; Flege et al., Reference Flege, Yeni-Komshian and Liu1999; McDonald, Reference McDonald2000; Weber-Fox & Neville, Reference Weber-Fox, Neville and Birdsong1999). Thus, divergence is a possible long-term outcome for the L2 children in this study. Recall also that monolingual children reach ceiling on the TEGI probes by age 6;0, and some child L2 speakers in this study did not converge on native-speaker accuracy even after six years of exposure to English. Furthermore, the shape of the developmental trajectories suggests that the L2 children might not get much closer to native-speaker accuracy in the future, for the DO probe in particular. However, for most probes, a plateau was only evident between Rounds 2 and 3, and a finding of non-native levels of accuracy extending over a longer period of time would constitute more convincing evidence for these speakers having reached their ultimate attainment in the L2. At the limit, we believe this study suggests that even early AoA child L2 learners could be at risk for divergence from monolinguals in their accuracy with English morphology, and that this divergence might be evident in their L2 by four to six years of exposure. Further research with childhood L2 learners with even longer exposure would be needed to draw conclusions about ultimate attainment with certainty.
Non-age predictors of L2 morphological acquisition
The variation observed with respect to children's individual outcomes on the TEGI probes indicate that other, non-AoA, factors were influencing their acquisition, and the results of our linear mixed regression analyses confirm this indication. We found that the child-internal factors, verbal short-term memory (CTOPP), and vocabulary size (PVVT) were the most common predictors, and the environmental factors, English richness, and English-use-at-home appeared less frequently in the models. However, it is notable that English richness did emerge as a significant predictor in other good-fitting but not optimal models for past regular, past irregular, and BE. English-use-at-home was also a significant predictor in a good-fitting model for the BE responses. Thus, while the internal factors were exerting the strongest influence, the influence of environmental factors on children's performance was also present. Regarding language-level factors, we found that allomorph type influenced accuracy with 3rd sing. [-s] and past regular, and word frequency influenced accuracy with past irregular. Because factors like superior verbal short-term memory or richer English input predict higher scores on the TEGI, in turn they predict whether children achieve native-like abilities or not because higher scores are more likely to reach the age-expected criterion. As such, the results of this analysis show that non-AoA factors could play a decisive role in whether or not early L2 learners catch up to their native-speaker peers with L2 verb morphology after four to six years of exposure.
It is relevant to consider whether the secondary role of input factors could have been an artifact of how they were measured in the present study, because English-use-at-home and English richness are composite and indirect (parent report) measures. Prior research indicates that input factors such as diversity of speakers, family composition, and parents' fluency in the L2 can exert independent influences on bilingual children's development (Armon-Lotem et al., Reference Armon-Lotem, Joffe, Abutbul-Oz, Altman, Walters, Grüter and Paradis2014; Hoff et al., Reference Hoff, Welsh, Place, Ribot, Grüter and Paradis2014). In this study, English-use-in-the-home is a composite measure of both input to children and children's output. Is it possible that parents' use of English to the children, if that English were heavily accented and contained morphological errors, could have contributed to the variability in children's English output? The proportion of English-use-in-the-home among all family members was .36 to.40 on average (see Table 1), meaning more Chinese was spoken amongst them. Breaking apart the variable of English-use-in-the-home to just parents' use of English, the proportion shrinks to .27–.28 on average across three rounds. It is also important to keep in mind that the city itself, and the schools children were attending, are culturally and linguistically diverse, and so children's input outside the home was comprised of a variety of English speakers, including native speakers. Furthermore, it is reasonable to assume that for eight- to ten-year-old children, the variety of language input sources (school, friends, media) beyond interaction with parents would be much greater than for younger children. Therefore, the small amount of individual variation predicted by English-use-at-home, together with the limited amount of English actually used by parents at home, indicate that for this sample of children, it is unlikely that parents' accented speech was a major contributor to the variability in children's outcomes. Nevertheless, future research examining input factors and children's L2 outcomes should include a more fine-grained breakdown of these factors and some direct measures. Doing so would enable us to better understand the balance between internal and external factors predicting individual variation in morphological acquisition.
Finally, the analyses in this study revealed that factors influencing children's L2 abilities with verb morphology at the early stages of acquisition continue to shape their development even at later stages. For example, studies with child L2 learners with less L2 exposure have found that verbal short-term memory, vocabulary size, L2 input quality and quantity, and allomorph and word frequency influence L2 morphological acquisition (Armon-Lotem et al., Reference Armon-Lotem, Walters and Gagarina2011; Blom & Paradis, Reference Blom and Paradis2013, Reference Blom and Paradis2015; Blom et al., Reference Blom, Paradis and Sorenson Duncan2012; Paradis, Reference Paradis2011). Other studies that included children with long-term exposure have also found that L2 input quality and quantity factors (Chrondrogianni & Marinis, Reference Chondrogianni and Marinis2011; Jia & Fuse, Reference Jia and Fuse2007, Unsworth, Reference Unsworth2013; Unsworth et al., Reference Unsworth, Argyri, Cornips, Hulk, Sorace and Tsimpli2014), and language-level factors (Marinis & Chrondrogianni, Reference Marinis and Chondrogianni2010) predict L2 grammatical abilities. The results from this study, together with the existing literature, raise the question of why individual difference and language-level factors would still be exerting an effect at later stages in acquisition. For example, if a vocabulary of a certain size constituted a ‘critical mass’ needed for children to begin to become productive with verb morphology (e.g. Marchman & Bates, Reference Marchman and Bates1994), why would vocabulary size matter at later stages when productivity is clearly evident? The continued influence of these individual difference and language factors across years of L2 acquisition indicate that they do not serve a kind of ‘bootstrapping’ function early on and then fade away. The continued influence of these factors could be argued to support Usage-Based or Emergentist models of morphology and the lexicon (e.g. Bybee, Reference Bybee2010; Ellis, Reference Ellis2008). This is because such models assume that lexical composition, input frequency, and cognitive mechanisms like verbal memory skills all impact morphological learning, processing, and use across the lifespan, and also influence diachronic change.
This study found that there was individual variation in the children's long-term L2 outcomes with verb morphology. Thirty-nine percent of the children had acquired native-like levels of accuracy for all morphemes by 6;4 years of L2 exposure, whereas 61% had not reached this level for all morphemes. Because all the children in this study had similar early AoAs, the likelihood of individuals achieving native-like accuracy in their L2 in this timeframe was due to non-AoA factors. Variation in children's accuracy with English morphology was predicted by variation in verbal short-term memory, vocabulary size, and child- and language-level input factors. Our results suggest that the four to six years ‘catching up’ timeframe for L2 oral language (Hakuta et al., Reference Hakuta, Goto Butler and Witt2000; Saunders & O'Brien, Reference Saunders, O'Brien, Genesee, Lindholm-Leary, Saunders and Christian2006) is insufficient for verb morphology, at least for children from typologically isolating L1 backgrounds. Another way to interpret these results is that this timeframe is sufficient because variable use, and thus divergence with monolingual accuracy levels, might constitute the long-term outcomes for some bilingual speakers. This alternative interpretation raises the broader question of what the appropriate expectations are for child L2 acquisition, and whether monolingual speakers should be the ‘gold standard’ for comparison (Ellis, Reference Ellis2008; Muñoz & Singleton, Reference Muñoz and Singleton2011). Child bilingual speakers arguably have more sources of variation in their learning experience than monolinguals, and since these sources of variation shape their L2 acquisition at both early and later stages, it is logical to expect greater variability in linguistic outcomes. While we believe comparisons between monolingual native speakers and bilinguals can be informative from a scientific perspective, we also believe that interpretations of divergence between child bilinguals and monolinguals in long-term outcomes should be careful not to promote a deficit view of bilingualism (cf. Muñoz & Singleton, Reference Muñoz and Singleton2011).