Over the past 50 years, scholars have shown that outcomes of postpubertal second-language (L2) speech learning are characterized by a great deal of individual differences. Many scholars have ascribed such variation to a range of biographical factors, such as earlier age of acquisition (Hopp & Schmid, Reference Hopp and Schmid2013), longer immersion experience (Trofimovich & Baker, Reference Trofimovich and Baker2006), more frequent L2 use (Jia & Aaronson, Reference Jia and Aaronson2003), and greater levels of willingness to communicate (Derwing & Munro, Reference Derwing and Munro2013). However, such experience-related factors do not fully explain the observed variances, especially when it comes to the incidence of highly advanced L2 learners (Abrahamsson & Hyltenstam, Reference Abrahamsson and & Hyltenstam2009) and the acquisition of relatively complex, non-salient, and difficult linguistic features (e.g., Li, Reference Li2016). According to the aptitude-acquisition view (Doughty, Reference Doughty2019), even learners with comparable biographical backgrounds who spend the same amount of time practicing a target language in the same fashion may vary greatly in their resulting levels of proficiency. This is arguably due to perceptual-cognitive individual differences (i.e., aptitude) that in turn determine the extent to which L2 learners can take advantage of every input opportunity, maximizing the long-term learning gains. Following this line of thought, a growing number of scholars have begun to demonstrate that domain-general auditory processing, which cognitive psychology literature has identified as a foundation of first-language (L1) acquisition, can explain some variances in phonological dimensions of L2 speech learning (Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Omote et al., Reference Omote, Jasmin and Tierney2017; for a comprehensive review, Saito et al., Reference Saito, Suzukida, Tran and Tierney2021). In this paper, we report the results of an empirical study examining the extent to which the link between auditory precision and acquisition can be generalized to adult L2 learners’ processing and acquisition of higher-order linguistic information, that is, the appropriate use of diverse, rich, and abstract vocabulary during spontaneous speech in the context of 100 Polish-English bilinguals with varied age and experience backgrounds in the UK.
Modeling, assessing, and developing spoken L2 vocabulary proficiency
Whereas vocabulary is considered to be an integral unit of L2 learning, much of the existing work has been concerned with the assessment and development of receptive vocabulary knowledge. It has been shown: (a) that such receptive knowledge can be operationalized as vocabulary size (2–3 k frequent word families for beginner L2 speakers; 24 k frequent word families for L1 speakers; Webb & Nation, Reference Webb and Nation2017); (b) that L2 speakers’ vocabulary size continues to improve as a function of increased input (for a discussion on the number of encounters vs. acquisition, see Pellicer-Sánchez, Reference Pellicer-Sánchez2016); (c) that many learners can achieve nativelike vocabulary size as long as they engage in a great deal of L2 immersion experience (Hellman, Reference Hellman2011); and (d) that vocabulary size may be strongly correlated with a wide range of global L2 skills (e.g., listening, reading, speaking, and writing; see Schmitt, Reference Schmitt2010).
In contrast, productive L2 vocabulary has remained understudied. For example, word frequency does not serve as a reliable index of advanced L2 speakers’ spoken vocabulary use as they do not necessarily use more infrequent words while speaking (Crossley et al., Reference Crossley, Skalicky, Kyle and Monteiro2019). For a long time, scholars have debated on how productive vocabulary knowledge can be assessed, how L2 speakers develop it, and what kinds of factors matter for its acquisition (for a review, see Koizumi, Reference Koizumi2012). With respect to spoken L2 vocabulary, prior studies have indicated that even highly experienced L2 speakers’ productive vocabulary use is subject to a great deal of individual variation, hinting at the possibility that some form of aptitude may play a very critical role in determining the incidence of high-level productive L2 vocabulary attainment (e.g., Hyltenstam, Reference Hyltenstam1988).
Recently, Crossley and his colleagues have proposed, developed, and refined a computational model of L2 learners’ spoken vocabulary use (Crossley et al., Reference Crossley, Salsbury and McNamara2015; Kyle & Crossley, Reference Kyle and Crossley2015). Within this framework, the lexical dimensions of L2 speech are analyzed from two different perspectives. The first dimension (appropriateness) is defined as the ability to use a combination of words in a contextually appropriate and nativelike manner with the correct assignment of morphological markers. For similar accounts of appropriateness, see the semantic, collocational, and grammatical functions of word in Nation’s (2001) model of L2 vocabulary knowledge. The second dimension (richness) is defined as the ability to use more infrequent, context-specific, and abstract words. This corresponds to the width and breadth and depth of word knowledge in Ellis’s (Reference Ellis2002) model of lexical acquisition. Though few in number, empirical studies have examined how the appropriateness and richness aspects of L2 lexical knowledge develop among various types of L2 learners.
In terms of the initial phase of immersion (length of residence [LOR] < 1 year), much of the learning appears to benefit L2 learners’ use of rich and varied vocabulary. Crossley and his colleagues longitudinally analyzed the lexical richness of six L2 learners’ L2 speech development over 1 year. Participants’ spoken vocabulary quickly became more abstract, using more hypernyms and less concrete words, especially within the first 4 months (Salsbury et al., Reference Salsbury, Crossley and McNamara2011). As for the ultimate attainment of more experienced and advanced L2 learners’ vocabulary use, the literature has been severely limited. Bartning et al. (Reference Bartning, Lundell and Hancock2012) investigated the spoken morphological accuracy of the speech of 20 experienced late native Swedish learners of L2 (LOR > 5 years). The results demonstrated that the participants’ accuracy performance was significantly distinguishable from inexperienced learners (LOR < 2 years) and native controls.
In the context of 100+ Japanese learners of English with varied experience profiles in naturalistic and classroom settings, Saito (Reference Saito2015, Reference Saito2019, forthcoming) examined the degree of vocabulary appropriateness (lexical, collocational, and morphological accuracy) and richness (frequency, range/context specificity, and abstractness). Experienced learners’ (LOR > 6 years) spoken vocabulary use was significantly more accurate, varied, and richer than that of inexperienced learners in spontaneous speech. Interestingly, whereas few ultimately attained nativelike lexical accuracy, many experienced participants’ richness performance was indistinguishable from native controls. The results indicate that the rate and ultimate attainment of spoken L2 vocabulary learning may differ in appropriateness and richness. On the one hand, many L2 learners can expand L2 vocabulary richness and reach a nativelike level within a short period of immersion (< 1 year), as long as they practice and use the target language. On the other hand, whereas L2 learners’ vocabulary use tends to be more accurate as a result of increased immersion, the incidence of nativelike accuracy appears to be limited to very few individuals (cf. Hyltenstam, Reference Hyltenstam1988).
In the current study, we test the hypothesis that the outcomes of spoken L2 vocabulary development can be explained not only by experience-related factors (length, quality, and timing of L2 use), but also by learners’ aptitude profiles (i.e., auditory processing). More specifically, we assume that the aptitude and acquisition link can be most clearly observed, especially in the relatively difficult aspects of L2 vocabulary learning—that is, appropriateness rather than richness (cf. see the Results section for the benchmark analyses of L1 and L2 speakers’ vocabulary proficiency). In this way, those with greater aptitude are expected to attain high-level L2 lexical proficiency after years of immersion as they can make the most of every practice opportunity in naturalistic settings (Doughty, Reference Doughty2019).
Domain-general auditory processing in L1 acquisition
In the field of cognitive psychology, one major theoretical debate concerns whether, to what degree, and how certain regions of the brain are specifically involved in human language acquisition (for an overview, see Campbell & Tyler, Reference Campbell and Tyler2018). One influential view states that the same perceptual-cognitive faculties govern a range of general purpose learning behaviors including language learning, and an example of such a domain-general capacity that has received much attention is auditory processing. This ability is collectively referred to as a set of basic, low-level perception skills to encode, represent, and remember frequency and time dimensions of sounds (e.g., pitch, formants, duration, and amplitude). Many scholars have argued that individual differences in such auditory perception skills play a key role in the speed, development, and delay in first-language (L1) acquisition (i.e., the auditory-deficit theory; Goswami, Reference Goswami2015; Tallal, Reference Tallal2004).
Auditory processing serves as “the gateway to spoken language” (Mueller et al., Reference Mueller, Friederici and Männel2012, p. 15953), as it anchors every stage of phonological, lexical, and morphosyntactic processing. In order to detect phonetic and phonological categories, it is necessary to encode the relative weights of multiple acoustic cues, such as formant height, shape, and length for vowels (Kuhl, Reference Kuhl2000) and approximants (Espy-Wilson et al., Reference Espy-Wilson, Boyce, Jackson, Narayanan and Alwan2000), pitch and voice onset time for stop consonants (Shultz et al., Reference Shultz, Francis and Llanos2012), and pitch height and contour for lexical tones (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010). More robust, prompt, and automatic phonetic and phonological analyses directly relate to the activation of contextually appropriate target words (Norris & McQueen, Reference Norris and McQueen2008), the detection of word and sentence boundaries (Cutler & Butterfield, Reference Cutler and Butterfield1992), and the refinement of morphological details (Joanisse & Seidenberg, Reference Joanisse and Seidenberg1998).
Among typical language development, auditory sensitivity continues to grow up to the age of 8 to 9 years, followed by a gradually declining curve through older adulthood (Skoe et al., Reference Skoe, Krizman, Anderson and Kraus2015). There is ample evidence that when toddlers experience difficulties at the level of basic lower-level auditory perception, their acquisition of phonetic, phonological, lexical, and morphosyntactic knowledge is slowed down, resulting in a range of global language problems (for a research synthesis, see Hämäläinen et al., Reference Hämäläinen, Salminen and Leppänen2013). For example, global language skills, such as reading and phonological awareness, are linked to the perception of nonverbal spectral (pitch and formats) and temporal (duration and amplitude) encoding (Foxton et al., Reference Foxton, Talcott, Witton, Brace, McIntyre and Griffiths2003; Grube et al., Reference Grube, Kumar, Cooper, Turton and Griffiths2012). Thus, there is much correlational evidence showing that dyslexic children are more likely to have auditory deficits (Casini et al., Reference Casini, Pech-Georgel and Ziegler2018; Goswami et al., Reference Goswami, Wang, Cruz, Fosker, Mead and Huss2011; Won et al., Reference Won, Tremblay, Clinard, Wright, Sagi and Svirsky2016). Some scholars have suggested auditory processing measures as a diagnostic tool for dyslexia (Hornickel & Kraus, Reference Hornickel and Kraus2013) and other language-related disorders (Russo et al., Reference Russo, Skoe, Trommer, Nicol, Zecker, Bradlow and Kraus2008).
As for normal-hearing children (i.e., children who have not been diagnosed with specific language impairment or dyslexia), there is ample research examining the relationship between individual differences in auditory processing and language skills (e.g., Anvari et al., Reference Anvari, Trainor, Woodside and Levy2002; Bavin et al., Reference Bavin, Grayden, Scott and Stefanakis2010; Boets et al., Reference Boets, Wouters, Van Wieringen, De Smedt and Ghesquiere2008; Douglas & Willatt, Reference Douglas and Willatts1994; Lamb & Gregory, Reference Lamb and Gregory1993; Talcott et al., Reference Talcott, Witton, McLean, Hansen, Rees, Green and Stein2000; Tierney et al., Reference Tierney, Gomez, Fedele and Kirkham2021). In essence, these studies have indicated (a) that children without hearing impairment nonetheless vary in auditory abilities and (b) that this variability is linked to a range of language outcomes (speech-in-noise perception, vocabulary use, literacy, and phonological awareness).
When it comes to normal-hearing adults, similar individual variation has been observed (e.g., Kidd et al., Reference Kidd, Watson and Gygi2007). However, the correlations between auditory processing and speech perception abilities appear to be unclear (e.g., but see Ahissar et al., Reference Ahissar, Protopapas, Reid and Merzenich2000; Surprenant & Watson, Reference Surprenant and Watson2001). One possible reason for this could be related to the various redundancies in speech perception. Every phonological contrast involves the complex integration of multiple acoustic signals. Due to the existence of multiple redundant cues, when listeners fail to perceive one, they may still accurately perceive the phoneme based on a different cue (e.g., the perception of English stops using voice onset time and/or pitch in following vowels; Toscano & McMurray, Reference Toscano and McMurray2010). As L1 speakers regularly engage in language-based interactions during which they receive input for prolonged periods of time, accumulating a great deal of relevant speech perception experience, even those with particular auditory deficits may identify/adopt unique cue weighting strategies to optimize speech recognition (e.g., Jasmin et al., Reference Jasmin, Dick, Holt and Tierney2019 for the case of amusics using duration rather than pitch cues for the normal perception of speech and music).
Domain-general auditory processing in L2 acquisition
More recently, some scholars (e.g., Saito, et al., Reference Saito, Sun and Tierney2020a) have begun to argue not only that auditory processing could explain some variance in adult L2 learners’ speech learning outcomes but also that it may play an even more influential role in L2 than L1 acquisition because of the quantitative and qualitative differences between L1 and L2 learning processes. In L1 acquisition, even infants with auditory perception deficits may overcome acquisition problems with extensive exposure to input for a long period of time (Rosen, Reference Rosen2003). Contrastingly, adult L2 learners typically have limited access to exposure to their target language, even under immersion conditions (Jia & Aaronson, Reference Jia and Aaronson2003). Unlike L1 learners, the lack of sufficient exposure opportunities may prevent L2 speakers from compensating for any perceptual deficit hindering their L2 comprehension development. In L2 learning contexts, any perceptual advantage or disadvantage can more strongly predict the extent to which L2 learners can benefit most from such limited input opportunities (Doughty, Reference Doughty2019).
Compared to L1 acquisition, in which auditory category learning takes place on a blank state (free of prior phonetic experience), it is important to note that adult L2 learners filter a new language input through their already-established auditory representations. In particular, they have to attend to new cues when L2 phonetic and phonological categories differ from L1. For example, Japanese speakers must learn to perceive difference in the third formant to acquire English [r] and [l] (Iverson et al., Reference Iverson, Kuhl, Akahane-Yamada, Diesch, Kettermann and Siebert2003), as well as to adjust and re-tune to existing analysis patterns when the cue weightings only partially overlap between L1 and L2 sounds (e.g., Chinese speakers need to deprioritize pitch and prioritize duration cues to acquire English word and sentence stress patterns; Jasmin et al., Reference Jasmin, Sun and Tierney2020). Developing or/and adjusting perceptual strategies to rely on new sources of input may draw on the ability to precisely and explicitly encode auditory dimensions, and so individual differences in auditory processing may demonstrate even greater predictive power for adult L2 speech learning success (Doughty, Reference Doughty2019).
With regard to testing the auditory-deficit hypothesis in L2 acquisition, there is a growing body of empirical research featuring a wide range of adult L2 speakers with diverse experience backgrounds. This work has found that individuals with more precise auditory processing abilities can hear and remember unfamiliar sounds and words more quickly when they are exposed to them (e.g., Kempe et al., Reference Kempe, Bublitz and Brooks2015; Wong & Perrachione, Reference Wong and Perrachione2007). In addition, individuals who demonstrate greater sensitivity to key acoustic information in a gradient manner (perceiving fine differences within sound categories) are more capable of integrating multiple cues to a single percept (e.g., Kim et al., Reference Kim, Clayards and Kong2020 for the weighting of vowel quality and quantity in the perception of synthesize vowel contrasts; Kong & Edwards, Reference Kong and Edwards2016 for the weighting of voice onset time and pitch in the perception of synthesized stop voicing contrasts; Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010 for the weighting of pitch direction and height in English speakers’ perception of Mandarin lexical tones).
When it comes to naturalistic L2 speech learning, it is individual variation in auditory processing that determines learning success even after the length and quality of experience is controlled for (Omote et al., Reference Omote, Jasmin and Tierney2017 for segmental and suprasegmental perception; Saito et al., Reference Saito, Sun and Tierney2020a for segmental and suprasegmental production). Interestingly, the relationship between auditory processing and acquisition tends to be stronger when L2 learners have engaged in a sufficient amount of immersion experience (e.g., > 1 year: Saito et al., Reference Saito, Sun, Kachlicka, Alayo, Nakata and Tierney2020), and when the analyses focus on the relatively difficult aspects of L2 speech learning (e.g., phonological accuracy rather than fluency; Saito et al., Reference Saito, Sun and Tierney2020a). In contrast, the predictive power of auditory processing may be smaller when learners lack opportunities to be exposed to extensive, interactive, and varied aural input (Saito et al., Reference Saito, Suzukida, Tran and Tierney2021 for classroom L2 learners).
Not surprisingly, all the aforementioned studies have exclusively concerned the relationship between auditory perception and L2 phonology, since the role of auditory input processing is most directly relevant to segmental and suprasegmental acquisition. If we take the theoretical stance that auditory processing is a bottleneck for various dimensions of L2 speech acquisition, the question now becomes the extent to which auditory processing influences the acquisition of higher-level linguistic competence beyond phonological refinements—that is, appropriate use of rich and varied vocabulary items. The current study is designed to address this issue.
Motivation for current study
There are several reasons to predict the presence of a relationship between more precise auditory processing and spoken L2 vocabulary development. Although the context of the topic is exclusively limited to English, there is some discussion to support the hypothesis that auditory processing drives the lexical, morphosyntactic, and global aspects of L2 learning.
At the lexical level, the detection of L2 lexical and sentence stress patterns is claimed to be fundamental to segmenting and making input available for word analyses (Field, Reference Field2005). Given that these linguistic phenomena are marked by changes in pitch, duration, and amplitude, it is reasonable to assume that individuals with greater sensitivity to the relevant acoustic dimensions can better encode, notice, and internalize novel or L2 prosodic patterns relative to L1 counterparts (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010). Relatedly, speech corpus research has shown that more frequent collocations are characterized by shorter word duration (Gregory et al., Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999) and that more frequent, predictable, or/and redundant words have shorter durations as well as reduced pitch and amplitude range (see also Bybee & Scheibman, 1996 for the relationship between collocational strength and vowel reduction; Shattuck-Hufnagel & Turk, Reference Shattuck-Hufnagel and Turk1996). Thus, we hypothesize that robust prosodic processing may help L2 learners infer from every instance of input not only which parts of speech could be chunked together but also whether they serve as frequent collocational units.
At the morphosyntactic level, it has been shown that the linguistic features with which L2 learners have the most difficulty tend to have fewer phonemes, and low syllabicity and sonority (e.g., Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001). According to the prosodic account of L2 grammar (Goad & White, Reference Goad and White2019), the accurate encoding of prosodic cues is believed to be a necessary condition for the acquisition of complex morphology (e.g., inflection), syntax (e.g., word order), and semantics (e.g., articles). Thus, we hypothesize that individual differences in pitch, duration, and amplitude rise time may determine the extent to which learners can extract L2 morphosyntactic information from aural input and that those with more precise prosodic processing can demonstrate more advanced L2 morphosyntactic proficiency (e.g., Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Saito et al., Reference Saito, Sun and Tierney2020a).
At the global level, whereas L2 vocabulary knowledge is instrumental to global reading and listening skills, it has been shown that those who have attained highly advanced L2 listening and reading proficiency are likely to have greater working memory, attentional control, and phonological awareness (Vafaee & Suzuki, Reference Vafaee and Suzuki2020; Wallace, Reference Wallace2020). Importantly, scholars have also demonstrated that auditory processing and cognitive abilities are interwoven with each other (Ahissar et al., Reference Ahissar, Lubin, Putter-Katz and Banai2006; Grube et al., Reference Grube, Kumar, Cooper, Turton and Griffiths2012; Snowling et al., Reference Snowling, Gooch, McArthur and Hulme2018). As such, auditory processing and memory abilities can simultaneously help learners hold aural information for a longer period of time, thereby making it available for more robust acoustic analyses (Zhang et al., Reference Zhang, Moore, Guiraud, Molloy, Yan and Amitay2016).
Given that auditory processing is an important determinant of phonological aspects of L2 speech learning (e.g., Omote et al., Reference Omote, Jasmin and Tierney2017), and that the mechanisms underlying the individual differences in L2 lexical production development and attainment have remained open to investigation (Saito, Reference Saito2015, Reference Saito2020), the current study explored the relationship between a total of 100 late Polish-English bilinguals’ profiles of auditory processing (pitch, duration, and amplitude rise time), biographical backgrounds (length of immersion, musical training, and age of arrival [AOA]), and spoken vocabulary proficiency (appropriateness and richness). The following research question, followed by predictions, was formulated:
Whether to what degree and how does auditory processing relate to postpubertal L2 learners’ spoken vocabulary proficiency when biographical factors are controlled for?
According to the cross-sectional and longitudinal investigations, much vocabulary learning can be observed in richness (e.g., Salsbury et al., Reference Salsbury, Crossley and McNamara2011), and an extensive period of regular and frequent L2 use may be needed to make a perceptible change in appropriateness (e.g., Saito, Reference Saito2019 for 10+ years of immersion). As stated in the aptitude-acquisition hypothesis (Doughty, Reference Doughty2019), it is in such relatively difficult aspects (i.e., appropriateness rather than richness) where aptitude (including auditory processing) may play a key role in determining the extent to which certain L2 learners can attain advanced L2 lexicogrammatical proficiency (rich and accurate). In particular, more precise auditory processing (pitch, duration, and amplitude in particular) may help L2 learners: (a) segment aural input into words with the accurate use of lexical stress; (b) detect, internalize, and use more frequent, strongly combined collocational chunks in a contextually appropriate manner; and (c) access perceptually non-salient morphological markers (fewer phonemes, low syllabicity, and sonority; e.g., Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001 for regular past tense).
The participants were 100 Polish residents in the UK whose pronunciation performance was assessed in the precursor project. The length of immersion varied widely from 0.1 to 19 years. The data collection (speech and auditory processing tests) was conducted with a researcher at a university in London. While the frequency of daily L2 use varied across different contexts (work, home, and family), according to individual interviews, the participants reported that their main language of communication, at work and/or at home, was L2 English. After experiencing 6–15 years of English-as-a-Foreign-Language education in Poland, the participants arrived in the UK after puberty (AOA > 17 years). None reported any hearing nor reading problems. All the biographical information is detailed in Table 1.
In the precursor research (Saito et al., Reference Saito, Sun and Tierney2020a), the differential effects of age, experience, and auditory processing on the participants’ phonological accuracy and fluency were measured. The analyses and findings were based on short speech samples (30 s per participant) elicited using a semi-structured speaking task (i.e., picture description), which would fall short of the length threshold for spoken lexicogrammar analyses. With a view of conducting robust analyses of spoken lexicogrammar, scholars have suggested 100+ words as a minimum length requirement (Koizumi & In’nami, Reference Koizumi and In’nami2012). To elicit sufficiently long spontaneous speech samples, and to capture appropriate use of vocabulary and grammar, longer speech samples from the same participants were elicited using a different task, that is, oral interview.
Free speech tasks, such as the oral interview task in this study, have been widely used in L2 vocabulary research (e.g., Crossley et al., Reference Crossley, Salsbury and McNamara2015) and high-stakes speaking-ability tests (e.g., IELTS). First, participants were asked to talk about the following topic (i.e., What was the hardest and toughest challenge in your life?). After 1 min of planning time, they spoke for roughly 2 min. Finally, the researcher asked a few follow-up questions in response to the content of their speech (for the materials used in the study, see Supporting Information-A). Compared to the highly structured task used in the precursor project, wherein participants focused on describing already provided information (picture narratives), the format of the interview task could be considered less structured, encouraging participants to produce longer and more complex speech while talking about more informal, familiar, and personal topics with freedom (see Skehan, Reference Skehan1998).
To control for the effects of phonological quality on L2 analyses, all the recordings were transcribed and cleaned by removing filled pauses (e.g., “ah, eh, um”) and fixing obvious mispronunciation problems (e.g., life, pronounced as rife, would still be spelled as life). The duration of the transcripts widely varied (M = 503.1 words, Range = 106–1264 words). Four researchers initially transcribed the same five speech samples (out of the entire dataset, 100 speech samples) to compare their agreement. While their transcripts largely agreed with each other, they discussed any discrepancies and agreed on some transcription conventions (see Supporting Information-B). Afterward, the remaining 95 samples were divided between the 4 researchers, each of whom individually transcribed 20–25 samples. Whenever they encountered ambiguous situations, they consulted with each other to ensure that they had consistently followed the agreed conventions.
Analyses of appropriateness
To capture the multifaceted nature of appropriateness, three different approaches were adopted:
To account for the potentially different degrees of error gravity on global comprehension and communicative adequacy, scholars have emphasized the importance of expert raters’ holistic judgments (Foster & Wigglesworth, Reference Foster and Wigglesworth2016). Following the training procedure in Saito (Reference Saito2019), a total of five linguistically trained raters were recruited to assess semantic and morphosyntactic dimensions of appropriateness.
The raters included three native speakers of English (two from the UK and one from the USA) and two near-native speakers of English (one from Estonia and one from Germany). All of them received several years of linguistics training at universities in London and had a significant amount of experience in L2 speech analyses as they regularly participated in empirical research projects of this kind. They reported high levels of familiarity with vocabulary use in British English and foreign accented English in the UK. As reported below, all the raters demonstrated relatively high inter-rater agreement (see below).
The rating sessions took place individually under the supervision of a researcher. The raters first received definitions for the two different areas of appropriateness: (a) semantic and (b) morphosyntactic. For training scripts and onscreen labels, see Supporting Information-C. During the assessment, the samples were displayed on a computer screen in a randomized order using MATLAB software. For each token, the degree of appropriateness was assessed using a moving slider. If the slider was placed at the leftmost end of the continuum, labeled with a frowning face (indicating “non-targetlike”), the rating was recorded as 0. If the slider was placed at the rightmost end of the continuum, labeled with a smiley face (indicating “targetlike”), the rating was recorded as 1000. The scoring method was explicitly explained to the raters. None of them asked any questions. To avoid any confusion (as reported in some L2 assessment research using a numbered scale; (Isaacs & Thomson, Reference Isaacs and Thomson2013), no numerical values were displayed on the screen.
To ensure the raters’ understanding of the procedure, they evaluated three practice transcripts (not included in the main dataset) and explained/justified their decisions. For each response, the researcher gave feedback to ensure that the raters handled the three different categories without confusion. Finally, the raters moved onto the main dataset of 100 transcripts.
In our pilot run, the length of a session turned out to be a problem as some transcripts were long (> 1000 words). To reduce rater fatigue, all transcripts were equally cut down to 250 words, except for several samples that were already less than 250 words. Each session lasted for approximately 3 hr (including training and practice), and the raters took a short break (10 min) halfway through. A Cronbach’s alpha analysis revealed that the five raters demonstrated relatively strong agreement for semantic appropriateness (α = .81) and morphosyntactic appropriateness (α = .83). According to the post-rating questionnaire, the raters reported that they not only understood but also handled the three rubrics through the judgment sessions without confusion (M = 9 out of “1 = “very difficult”, 9 = “very easy and comfortable””). The five raters’ scores were averaged to generate two scores for each transcript, quantifying its semantic and morphosyntactic appropriateness.
Given that the task was designed to elicit the participants to use past tense while speaking, local morphosyntactic accuracy was operationalized via tallying the number of past tense errors by the number of obligatory contexts per sample (for a similar approach, see Kourtali & Révész, Reference Kourtali and Révész2020). The past tense in English was considered perceptually non-salient (Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001), was semantically redundant and less intrusive to communicative success (VanPatten, Reference VanPatten2002), and was reported to be difficult among many adult L2 learners of English (Ellis et al., Reference Ellis, Loewen and Erlam2006). Two linguistically trained coders first analyzed 20 (out of 100) samples. The inter-coder agreement was relatively high (r = .91). The first coder completed the rest of the analyses. The participants’ accuracy ratio widely ranged (M = 24.9%; SD = 18.3; Range = 0–92%). Since the obligatory context analysis could be influenced by text length, we also calculated residual accuracy scores with the length factor statistically controlled for.
Collocation is broadly defined as a meaningful combination of multiword expressions (Gablasova et al., Reference Gablasova, Brezina and McEnery2017) and found to serve as a primary determinant of humans’ intuitive of lexical appropriateness (Saito, Reference Saito2020). To this end, two corpus-based association measures were used, Mutual Information (MI) bigram and trigram. Conceptually, MI indicates the strength of the partnership between two- and three-word expressions, while controlling for the probability of random groupings of words. Collocations with higher MI scores consist of combinations of words which likely have a fewer number of partner words. These words likely exhibit greater coherence, more distinctive meaning, and clearer discourse functions. To calculate MI, random co-occurrences of words were first estimated by dividing the number of any possible combinations within a fixed window size (n = ± 5 words in TAALES) by the total number of tokens in the reference corpus (British National Corpus). Then, the frequency of collocations was divided by the frequency of random co-occurrence among the words and then logarithmized.
Analyses of richness
The multifaceted nature of richness was approached from three different perspectives via TAALES 2.0 (Kyle & Crossley, Reference Kyle and Crossley2015):
Word frequency refers to the extent to which less frequent and common words are used per sample. The index was calculated by dividing the total sum of frequency scores in reference to the British National Corpus by the number of all the words with frequency scores. In order to control for Zipﬁan effects in word frequency lists (higher-frequency words are more likely to be recycled), the raw scores were logarithmically transformed. Lower frequency scores indicate the use of less frequent words and more infrequent words, which is characteristic of more advanced L2 lexical proficiency (Crossley & McNamara, Reference Crossley and McNamara2009).
Word range refers to the extent to which L2 speakers used more specific words which are narrowly used and observed in certain contexts and genres (rather than across diverse contexts). The index was calculated by dividing the total sum of range scores by the number of words in the texts with range scores. Like frequency, the raw scores were logarithmically transformed. Words with lower range scores indicate the use of more context-specific words (restricted to certain genres), which is considered as an index of more advanced L2 lexical proficiency (Kyle & Crossley, Reference Kyle and Crossley2015).
Abstractness refers to the extent to which words that are less concrete, imageable, and familiar are used per sample. In TAALES, native speakers’ perceived judgments of concreteness and imageability were stored for 4,000 content words based on the MRC psycholinguistics database (Coltheart, Reference Coltheart1981). The average concreteness, imageability, and familiarity scores were calculated for each transcript (0–1000 points). Thus, those who often use words with lower judgment scores (i.e., less concrete, imageable, and familiar words) could be considered to have more advanced L2 lexical proficiency (Salsbury et al., Reference Salsbury, Crossley and McNamara2011).
Auditory processing measures
Following the methodology widely used in cognitive psychology (e.g., Surprenant & Watson, Reference Surprenant and Watson2001), participants’ domain-general perception ability was assessed using a battery of psychophysical assessments. The materials used for the current study were developed in precursor studies (e.g., Kachlicka et al., Reference Kachlicka, Saito and Tierney2019). As reviewed earlier, the acoustic dimensions relevant to L2 vocabulary acquisition were assumed to involve participants’ thresholds for discrimination of pitch, duration, and amplitude. For each test, three complex tone stimuli were presented, with either the first or the third sounding different from the other two. Participants indicated which sound was different by either pressing the number “1” or “3” on a keyboard. An adaptive three-alternative forced-choice procedure was used, such that the difficulty of the task would decrease after every incorrect response and increase after every third correct response. The program continued until eight reversals had been reached, that is, incorrect answers after a string of successes or correct answers after a string of failures.
For each test, 100 continuous synthesized stimuli (500 ms in length) were created via custom MATLAB scripts. They differed at 100 steps along the target acoustic dimension (Levels 1–100). A total of 100 four-harmonic complex tones were created with F0 set to 330 Hz and the amplitude of each harmonic set to 40 dB. The target acoustic dimension for each test varied by a step of 0.3 Hz in F0 (330.3–360 Hz), 2.5 ms in duration (252.5–500 ms), and 1.22 ms in amplitude rise time (178–300 ms), respectively.
When three different tones were presented with an inter-stimulus interval of 0.5 s, the participants were asked to choose which of the three tones differed from the other two by pressing the number “1” or “3.” Based on (Levitt, Reference Levitt1971) adaptive threshold procedure, the level of difficulty changed from trial to trial according to participants’ performance. The initial difficulty, that is the level of the target stimulus, was set to level 50. When three correct responses were made in a row, the difference became smaller by a degree of 10 steps (more difficult). When their response was incorrect, the difference became wider by a degree of 10 steps (easier).
The reverse happened when the direction of difficulty between trials reversed—that is, when an increase in acoustic difference (easier) was followed by a decrease (more difficult), or vice versa. After the first reversal, the step size decreased (more difficult) from 10 to 5, and then from 5 to 1 after the second reversal. The tests stopped either after 70 trials or 8 reversals. For participants’ auditory processing scores, the stimulus levels after the third reversal were averaged. Since the scores indicate how small of a difference participants can perceive, lower scores indicate more precise auditory processing scores.
To check the reliability of the auditory processing tests, a follow-up project was conducted with 30 English users with diverse experience and proficiency levels (not included in the current study). They took a range of auditory processing tests (including pitch, duration, and amplitude discrimination) twice with an interval of 1 day. The results of Spearman’s correlation analyses demonstrated small-to-medium strength for the individual tests (r = .632 for pitch, .333 for duration, and .737 for rise time). As for the composite auditory processing scores (averaging pitch, duration, and rise time discrimination), the reliability (r = .720) could be considered satisfactory and comparable to similar research (e.g., r = 0.75 in Raz et al., Reference Raz, Willerman and Yama1987). The results suggest that although using individual test scores may result in low reliability (e.g., duration discrimination), composite test scores may serve as a more reliable proxy of one’s auditory precision (for methodological details, see Brief Report in Saito et al., Reference Saito, Sun and Tierney2020b).
The descriptive results of pitch, duration, and amplitude rise time discrimination test scores were summarized in Supporting Information-D. Since the data significantly differed from the normal distribution (p < .01), their raw scores were transformed via a log10 function. To calculate participants’ overall prosodic encoding abilities, their raw scores were standardized and averaged. According to the results of the normality test (Kolmogorov–Smirnov), the resulting averaged scores were comparable to the normal distribution (p > .05) and thus were used for the subsequent analyses as a composite index of participants’ auditory processing of prosodic cues. Lower factor scores indicate more precise encoding of pitch, duration, and amplitude information.
For the sound stimuli used in the auditory processing tests (duration, pitch, and rise time), see the team’s website (www.sla-speech-tools.com, under construction).
First, we present the results of preliminary analyses to examine what characterizes spoken L2 vocabulary proficiency among 100 Polish-English bilinguals relative to L1 counterparts. Second, we show the results of factor analyses to explore what underlies spoken L2 vocabulary proficiency (which we analyzed via 11 outcome measures) and auditory processing abilities (which we analyzed via 3 outcome measures). Subsequently, we present the results of multiple regression analyses to probe how a range of predictor variables related to experience and auditory processing are uniquely associated with various dimensions of participants’ L2 lexical proficiency.
Spoken L2 versus L1 vocabulary proficiency
The descriptive results of the 11 vocabulary measures are summarized in Table 2. To examine what characterizes spoken L2 vocabulary proficiency, a set of 95% confidence interval analyses were performed to check the extent to which Polish-English bilinguals’ performance overlapped with (or deviated from) that of L1 speakers. In the prior project (Saito, forthcoming), a total of 10 monolingual speakers of English (born and raised in the English-speaking areas of Canada) completed the same oral interview task. The results indicated two overall patterns: (a) some Polish-English bilinguals reached nativelike proficiency in terms of richness (overlaps in 95% intervals in all measures) and (b) appropriateness could be considered as a relatively difficult dimension of spoken L2 vocabulary proficiency as L2 speakers’ proficiency was significantly distinguishable from the native benchmark in five of six measures (i.e., lexical, morphosyntactic, and collocational accuracy).
Note. aThe native control data derives from Saito (forthcoming).
Constructs of spoken L2 vocabulary proficiency
To check whether and to what degree they were assumed to tap into the constructs that we intended to measure (n = 6 for appropriateness and n = 5 for richness), they were submitted to an exploratory factor analysis with Varimax rotation. The factorability of the entire dataset was considered adequate according to Bartlett’s test of sphericity (χ 2 = 1506.938, p < .001) and the Kaiser–Meyer–Olkin measure of sampling adequacy (.659). Using the standard of an eigenvalue beyond 1.0, a five-factor solution was suggested, accounting for 90.713% of the variance in the outcomes of the auditory processing measures.
In terms of factor loadings, 0.6 was used as a cutoff point in line with Hair et al. (Reference Hair, Black, Babin, Anderson and Tatham1998) recommendation for factor analyses of relatively small sample size (n < 100). In light of the grouping patterns in Table 3, Factor 1 was labeled as “holistic accuracy” as it clustered both of the appropriateness judgment scores, Factor 2 was labeled as “breadth” as it corresponded the use of infrequent, context-specific, and unfamiliar words on a surface level, Factor 3 was labeled as “local accuracy,” Factor 4 was labeled as “abstractness” as it clustered the MRC psycholinguistics database of word concreteness and imageability, and Factor 5 was labeled as “collocational accuracy” as it included both corpus-based n-gram measures. According to the results of the Kolmogorov–Smirnov tests, the distribution of the resulting factor scores was not significantly different from the normal distribution (p > .05), and thus the scores were used for the subsequent analyses without transformation.
Note. aThe direction of the factor scores was reversed to proxy what the original scores indicate (more accurate and more collocational).
Roles of experience and auditory processing in spoken L2 vocabulary
To examine the relative weights of the biographical and auditory processing factors in the outcomes of spoken L2 vocabulary proficiency, a set of stepwise multiple regression analyses were conducted on each proficiency dimension as per a set of predictors related to auditory processing and experience. To avoid multicollinearity problems, the composite auditory processing scores (pitch, duration, and rise time discrimination) were used as a global index of auditory processing. Four experience factors were included as they were extensively discussed in the existing literature as crucial affecting factors L2 speech acquisition (Flege, Reference Flege2018 for AOA; Saito, Reference Saito2015 for LOR; Flege & Liu, Reference Flege and Liu2001 for Current L2 Use; Muñoz, Reference Muñoz2014 for Length of EFL).Footnote 1 The mechanisms underlying L2 speech learning are said to differ between the early and later phase of immersion (DeKeyser, Reference DeKeyser2013). To this end, five interaction terms were included to see whether and to what degree the five predictors differentially related to the L2 vocabulary proficiency among two different groups of L2 learners; dummy codes (1 and 2) were given to interlanguage learner group (n = 50; LOR = 0.1–5 years) and ultimate attainer group (n = 50; LOR = 6+ years). Finally, given that the length of participants’ speech widely varied (106–1264 words), this variable was also entered as a covariate. For each of the vocabulary proficiency dimensions (holistic accuracy, local accuracy, collocational accuracy, breadth, and abstractness), the following model was constructed:
Vocabulary Proficiency = Auditory Processing + Age of Arrival + Length of Residence + Current L2 Use + Length of EFL + Length of Speech + Auditory Processing × Group + Age of Arrival × Group + Length of Residence × Group + Current L2 Use × Group + EFL × Group
Model selection was conducted via SPSS based on the results of F tests. Backward elimination was chosen. After all the independent variables were entered, the largest probability of F was removed at each step (using p = .10 as a benchmark). The selection was completed when no variables were eligible for elimination. The details of the model building processes for each vocabulary domain were found in Supporting Information-E.
The final models were summarized in Table 4. The results generally showed that L2 accuracy was primarily predicted by auditory processing factors (composite prosodic processing scores) and secondarily by biographical factors (LOR and AOA). More specifically, the link between auditory processing and acquisition was weak in local accuracy (related to the use of past tense) relative to holistic and collocational accuracy (related to vocabulary use in general). The roles of biographical factors uniquely related to different types of accuracy. Holistic accuracy was tied to LOR, and local accuracy was associated with AOA. Interestingly, all the interaction effects were excluded in the final models in all instances, suggesting that the findings were generalizable across different stages of L2 acquisition (LOR = 0.1 to 40 years). None of the models of the richness factors (breadth and abstractness) reached statistical significance (p > .05). No clear sign of multicollinearity was found in any contexts (variance inflation factor < 1.231).Footnote 2
Note. aLower scores indicate lower error ratio (more accurate use of past tense).
Drawing on the auditory deficit theory in L1 acquisition, there is an emerging hypothesis that individual differences in experience, auditory processing, and L2 acquisition are interwoven (Mueller et al., Reference Mueller, Friederici and Männel2012). According to the precursor research, auditory processing is an important determinant of segmental and suprasegmental accuracy (rather than fluency) aspects of L2 speech, even when all the biographical factors (age, immersion experience, and music training) are controlled for (Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Saito et al. Reference Saito, Sun and Tierney2020a). To further scrutinize the generalizability of the topic to higher-order dimensions of postpubertal L2 speech acquisition, we aimed to examine the effects of auditory processing in spoken L2 vocabulary development and attainment among a total of 100 late Polish-English bilinguals in the UK.
According to the results of the statistical analyses, L2 learners who attained more advanced L2 vocabulary proficiency had not only more relevant experience (extensive immersion and earlier AOA), but also more precise auditory processing ability. As predicted earlier, our findings here generally align with the view that one’s ability to track individual dimensions of prosodic information (i.e., pitch, duration, and amplitude) serves as a key driving force for detecting lexical and syntactic boundaries (De Pijper & Sanderman, Reference De Pijper and Sanderman1994). Thus, it is possible that with more precise prosodic processing abilities, learners can better represent, encode, and segment ambient input into lexical and syntactic units, resulting in the development of more robust phonological and morphosyntactic knowledge (Jiang, Reference Jiang2000; Best & Tyler, Reference Best and Tyler2007). Additionally, more precise auditory processing abilities are linked to greater phonological awareness and executive function, which in turn facilitates L2 reading and listening complementarily (Linck et al., Reference Linck, Hughes, Campbell, Silbert, Tare, Jackson and Doughty2013). Finally, those with more precise sound timing may detect more closely and frequently used multiword units, as they are delivered faster than other less common and less predictable combinations of words (Gregory et al., Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999).
Importantly, auditory processing could be fundamental, especially concerning appropriateness rather than breadth and abstractness dimensions of L2 lexicogrammar development. This is arguably because the former dimensions (appropriateness) are claimed to be more difficult than the latter dimensions (breadth and abstractness). As shown in the current study (the results of the benchmark analyses), there was considerably larger distance between L1 and L2 speakers in appropriateness than breadth and abstractness. The development of accuracy has been found to takes place over a great deal of immersion experience at lexical (Saito, Reference Saito2019) and morphosyntactic levels (Bartning et al., Reference Bartning, Lundell and Hancock2012). According to the aptitude-acquisition view, it is the relatively difficult L2 learning aspects that are subject to a substantial amount of L2 experience and susceptible to the effects of individual differences in aptitude (Doughty, Reference Doughty2019). In terms of the breadth and abstractness aspects of spoken L2 vocabulary proficiency, the participants were comparable with each other regardless of experience and auditory profiles. This is arguably because many L2 learners’ vocabulary use could be sufficiently abstract even without much immersive experience (see Saito, Reference Saito2019; Salsbury et al., Reference Salsbury, Crossley and McNamara2011).
While the facilitative role of auditory processing is germane to higher-level linguistic skills to some degree, such as the production of L2 vocabulary, it is also important to remember that the outcomes of spoken L2 vocabulary development are moderately related to auditory processing. In fact, the strength of the audition-acquisition link could be considered small (e.g., r = −.346 for holistic accuracy in Table 4). In prior research, the predictive power of auditory perception appeared to be more clearly observed in lower-order linguistic skills which directly involve auditory information, such as segmental and suprasegmental perception (e.g., Kachlicka et al. for r = −.6) and production (e.g., Saito et al., Reference Saito, Sun and Tierney2020a for r = −.4 to −.5). Therefore, it would be intriguing to further examine whether and to what degree other cognitive measures may explain the remaining variance in spoken L2 vocabulary acquisition. Such potential predictors include working memory (Martin & Ellis, Reference Martin and Ellis2012), selective attention (Nicolay & Poncelet, Reference Nicolay and Poncelet2013), and foreign language aptitude (Li, Reference Li2016).
Given that the current study took an exploratory approach to delving into the role of auditory processing in spoken L2 vocabulary development, there are several methodological limitations that future studies should further remedy and expand. First, all the findings were based on the cross-sectional analyses of 100 late Polish-English bilinguals. To further examine the causal relationship between auditory processing, experience, and L2 speech learning, it is necessary to conduct a longitudinal investigation. For example, future studies should explore the variance in phonological and lexical aspects of L2 proficiency in participants with various auditory processing profiles over a certain period of training (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010) and immersion (Sun et al., Reference Sun, Saito and Tierney2021).
Secondly, participants’ auditory processing was analyzed via the psychoacoustic tests. However, it has been argued that the test format (A×B discrimination) may not only reflect participants’ auditory precision but also involve a range of cognitive abilities, such as attentional control (Snowling et al., Reference Snowling, Gooch, McArthur and Hulme2018). To control for the separate effects of perceptual and cognitive individual differences, future studies should adopt both auditory processing and executive function tests (cf. Saito et al., forthcoming for the relationship between memory, auditory processing, and L2 speech learning).
Thirdly, whereas participants’ spoken vocabulary proficiency was elicited from a single-task condition (oral interview), it has been shown that L2 learners’ speech performance is susceptible to change as per task conditions (see Ellis, Reference Ellis2009 for an overview on task effects on appropriateness, richness, and fluency). The findings of the current investigation need to be replicated using multiple tasks differing in terms of the timing and length of planning time (Ahmadian & Tavakoli, Reference Ahmadian and Tavakoli2011), the degree of structural complexity (Foster & Tavakoli, Reference Foster and Tavakoli2009), and conceptualization (Saito, forthcoming).
Fourthly, the generalizability of the findings (i.e., prosodic processing vs. spoken L2 vocabulary) needs to be tested for diverse L1–L2 pairings. Although we argued that prosodic acuity matters for L2 vocabulary acquisition due to its relevance to word segmentation, it is important to note that the relative weights of prosodic cues may be highly language-specific. For example, it would be interesting to replicate the findings in L2 French speakers who use stress to parse linguistic units at sentence but not word level (e.g., Dupoux et al., Reference Dupoux, Pallier, Sebastian and Mehler1997 for the cross-linguistic differences in word and sentence stress assignment and its impact on tone deafness)
Finally, whereas the current study indicated a potential link between auditory processing and the acquisition of L2 English past tense, it needs to be acknowledged that little is known about how auditory processing is related to L2 morphosyntax at a fine-grained level. In the field of second-language acquisition, a growing amount of attention has been directed toward detangling how phonology interfaces with various areas of grammar (for a comprehensive summary of the prosodic account of L2 behaviors, see Goad & White, Reference Goad and White2019). Given that Goldschneider and DeKeyser (Reference Goldschneider and DeKeyser2001) presented a plausible hierarchical framework for the perceptual acuity and morphosyntactic learning, one promising enquiry concerns the extent to which L2 learners with different levels of auditory processing abilities master L2 morphosyntax with different levels of perceptual salience (e.g., sonority). There is a possibility that individual differences in auditory processing (a core component of phonology) may be integral to the acquisition of grammar which interfaces lexicon, morphology, and syntax (e.g., inflection; Austin et al., Reference Austin, Chang, Kim and Daly2021) and semantics and discourse (e.g., articles; Demuth & McCullough, Reference Demuth and McCullough2009).
All in all, our findings concur with the mounting empirical evidence that auditory processing is a determinant of how much L2 learners can benefit from immersion experience, resulting in more advanced outcomes (Saito et al., Reference Saito, Sun and Tierney2020a), and the theoretical view that the same driving faculty of L1 acquisition (i.e., auditory processing) is tied to every stage of L2 acquisition throughout an individual’s lifetime (Flege, Reference Flege2018). Building on the prior work (e.g., Saito et al., Reference Saito, Sun and Tierney2020a for segmental and suprasegmental production), we add that such audition effects are more clearly observed not only in the acquisition of relatively difficult features (accuracy rather than fluency, breadth, and abstractness), but also in the dimensions more closely related to the speech signal (phonology rather than lexicogrammar). Interestingly, hearing research has shown that auditory deficits can be remedied via focused training (e.g., Carcagno & Plack, Reference Carcagno and Plack2011 for 10 hr of pitch discrimination training). In light of the significant relationship between auditory processing and L2 speech learning (though its strength varies across different linguistic dimensions), our study hints at the possibility that auditory training may help L2 learners amplify and optimize their acquisition processes, if it is provided at the same time that they engage in a certain period of immersive experience in a target language-speaking country (e.g., study abroad), or when they receive intensive or/and meaning-oriented speech training (e.g., Barriuso & Hayes-Harb, Reference Barriuso and Hayes-Harb2018 for high-variability phonetic training; Lee & Lyster, Reference Lee and Lyster2016 for focus on form; Lim & Holt, Reference Lim and Holt2011 for incidental video-gaming; Mora & Levkina, Reference Mora and Levkina2017 for task-based pronunciation teaching; Shao et al., Reference Shao, Saito and Tierney2022 for repetition-based training)—that is, a new interdisciplinary direction that linguistics, psychology, education, and hearing researchers can further pursue together.
This study was funded by Leverhulme Trust Research Grant (RPG-2019-039), Spencer Foundation Grant (202100074), and ESRC Connection Grant (ES/S013024/1). We gratefully acknowledge insightful comments from anonymous Applied Psycholinguistics reviewers on earlier versions of the manuscript.
This project obtained an ethical approval from University of London.