Skip to main content Accessibility help
Hostname: page-component-7ccbd9845f-z5z76 Total loading time: 1.106 Render date: 2023-01-28T19:49:38.289Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Roles of domain-general auditory processing in spoken second-language vocabulary attainment in adulthood

Published online by Cambridge University Press:  14 March 2022

Kazuya Saito*
University College London, London, UK
Konstantinos Macmillan
Birkbeck, University of London, London, UK
Sascha Kroeger
SOAS, University of London, London, UK
Viktoria Magne
University of West London, London, UK
Kotaro Takizawa
Waseda University, Shinjuku-Ku, Japan
Magdalena Kachlicka
University College London, London, UK
Adam Tierney
Birkbeck, University of London, London, UK
*Corresponding author. Email:
Rights & Permissions[Opens in a new window]


Recently, scholars have begun to explore the hypothesis that individual differences in domain-general auditory perception, which has been identified as an anchor of L1 acquisition, could explain some variance in postpubertal L2 learners’ segmental and suprasegmental learning in immersive settings. The current study set out to examine the generalizability of the topic to the acquisition of higher-level linguistic production skills—that is the appropriate use of diverse, rich, and abstract vocabulary. The speech of 100 Polish-English bilinguals was elicited using an interview task, submitted to corpus-/rater-based linguistic analyses, and linked to their ability to discriminate sounds based on individual acoustic dimensions (pitch, duration, and amplitude). According to the results, those who attained more advanced L2 lexical proficiency demonstrated not only more relevant experience (extensive immersion and earlier age of arrival), but also more precise auditory perception ability.

Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
© The Author(s), 2022. Published by Cambridge University Press

Over the past 50 years, scholars have shown that outcomes of postpubertal second-language (L2) speech learning are characterized by a great deal of individual differences. Many scholars have ascribed such variation to a range of biographical factors, such as earlier age of acquisition (Hopp & Schmid, Reference Hopp and Schmid2013), longer immersion experience (Trofimovich & Baker, Reference Trofimovich and Baker2006), more frequent L2 use (Jia & Aaronson, Reference Jia and Aaronson2003), and greater levels of willingness to communicate (Derwing & Munro, Reference Derwing and Munro2013). However, such experience-related factors do not fully explain the observed variances, especially when it comes to the incidence of highly advanced L2 learners (Abrahamsson & Hyltenstam, Reference Abrahamsson and & Hyltenstam2009) and the acquisition of relatively complex, non-salient, and difficult linguistic features (e.g., Li, Reference Li2016). According to the aptitude-acquisition view (Doughty, Reference Doughty2019), even learners with comparable biographical backgrounds who spend the same amount of time practicing a target language in the same fashion may vary greatly in their resulting levels of proficiency. This is arguably due to perceptual-cognitive individual differences (i.e., aptitude) that in turn determine the extent to which L2 learners can take advantage of every input opportunity, maximizing the long-term learning gains. Following this line of thought, a growing number of scholars have begun to demonstrate that domain-general auditory processing, which cognitive psychology literature has identified as a foundation of first-language (L1) acquisition, can explain some variances in phonological dimensions of L2 speech learning (Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Omote et al., Reference Omote, Jasmin and Tierney2017; for a comprehensive review, Saito et al., Reference Saito, Suzukida, Tran and Tierney2021). In this paper, we report the results of an empirical study examining the extent to which the link between auditory precision and acquisition can be generalized to adult L2 learners’ processing and acquisition of higher-order linguistic information, that is, the appropriate use of diverse, rich, and abstract vocabulary during spontaneous speech in the context of 100 Polish-English bilinguals with varied age and experience backgrounds in the UK.

Modeling, assessing, and developing spoken L2 vocabulary proficiency

Whereas vocabulary is considered to be an integral unit of L2 learning, much of the existing work has been concerned with the assessment and development of receptive vocabulary knowledge. It has been shown: (a) that such receptive knowledge can be operationalized as vocabulary size (2–3 k frequent word families for beginner L2 speakers; 24 k frequent word families for L1 speakers; Webb & Nation, Reference Webb and Nation2017); (b) that L2 speakers’ vocabulary size continues to improve as a function of increased input (for a discussion on the number of encounters vs. acquisition, see Pellicer-Sánchez, Reference Pellicer-Sánchez2016); (c) that many learners can achieve nativelike vocabulary size as long as they engage in a great deal of L2 immersion experience (Hellman, Reference Hellman2011); and (d) that vocabulary size may be strongly correlated with a wide range of global L2 skills (e.g., listening, reading, speaking, and writing; see Schmitt, Reference Schmitt2010).

In contrast, productive L2 vocabulary has remained understudied. For example, word frequency does not serve as a reliable index of advanced L2 speakers’ spoken vocabulary use as they do not necessarily use more infrequent words while speaking (Crossley et al., Reference Crossley, Skalicky, Kyle and Monteiro2019). For a long time, scholars have debated on how productive vocabulary knowledge can be assessed, how L2 speakers develop it, and what kinds of factors matter for its acquisition (for a review, see Koizumi, Reference Koizumi2012). With respect to spoken L2 vocabulary, prior studies have indicated that even highly experienced L2 speakers’ productive vocabulary use is subject to a great deal of individual variation, hinting at the possibility that some form of aptitude may play a very critical role in determining the incidence of high-level productive L2 vocabulary attainment (e.g., Hyltenstam, Reference Hyltenstam1988).

Recently, Crossley and his colleagues have proposed, developed, and refined a computational model of L2 learners’ spoken vocabulary use (Crossley et al., Reference Crossley, Salsbury and McNamara2015; Kyle & Crossley, Reference Kyle and Crossley2015). Within this framework, the lexical dimensions of L2 speech are analyzed from two different perspectives. The first dimension (appropriateness) is defined as the ability to use a combination of words in a contextually appropriate and nativelike manner with the correct assignment of morphological markers. For similar accounts of appropriateness, see the semantic, collocational, and grammatical functions of word in Nation’s (2001) model of L2 vocabulary knowledge. The second dimension (richness) is defined as the ability to use more infrequent, context-specific, and abstract words. This corresponds to the width and breadth and depth of word knowledge in Ellis’s (Reference Ellis2002) model of lexical acquisition. Though few in number, empirical studies have examined how the appropriateness and richness aspects of L2 lexical knowledge develop among various types of L2 learners.

In terms of the initial phase of immersion (length of residence [LOR] < 1 year), much of the learning appears to benefit L2 learners’ use of rich and varied vocabulary. Crossley and his colleagues longitudinally analyzed the lexical richness of six L2 learners’ L2 speech development over 1 year. Participants’ spoken vocabulary quickly became more abstract, using more hypernyms and less concrete words, especially within the first 4 months (Salsbury et al., Reference Salsbury, Crossley and McNamara2011). As for the ultimate attainment of more experienced and advanced L2 learners’ vocabulary use, the literature has been severely limited. Bartning et al. (Reference Bartning, Lundell and Hancock2012) investigated the spoken morphological accuracy of the speech of 20 experienced late native Swedish learners of L2 (LOR > 5 years). The results demonstrated that the participants’ accuracy performance was significantly distinguishable from inexperienced learners (LOR < 2 years) and native controls.

In the context of 100+ Japanese learners of English with varied experience profiles in naturalistic and classroom settings, Saito (Reference Saito2015, Reference Saito2019, forthcoming) examined the degree of vocabulary appropriateness (lexical, collocational, and morphological accuracy) and richness (frequency, range/context specificity, and abstractness). Experienced learners’ (LOR > 6 years) spoken vocabulary use was significantly more accurate, varied, and richer than that of inexperienced learners in spontaneous speech. Interestingly, whereas few ultimately attained nativelike lexical accuracy, many experienced participants’ richness performance was indistinguishable from native controls. The results indicate that the rate and ultimate attainment of spoken L2 vocabulary learning may differ in appropriateness and richness. On the one hand, many L2 learners can expand L2 vocabulary richness and reach a nativelike level within a short period of immersion (< 1 year), as long as they practice and use the target language. On the other hand, whereas L2 learners’ vocabulary use tends to be more accurate as a result of increased immersion, the incidence of nativelike accuracy appears to be limited to very few individuals (cf. Hyltenstam, Reference Hyltenstam1988).

In the current study, we test the hypothesis that the outcomes of spoken L2 vocabulary development can be explained not only by experience-related factors (length, quality, and timing of L2 use), but also by learners’ aptitude profiles (i.e., auditory processing). More specifically, we assume that the aptitude and acquisition link can be most clearly observed, especially in the relatively difficult aspects of L2 vocabulary learning—that is, appropriateness rather than richness (cf. see the Results section for the benchmark analyses of L1 and L2 speakers’ vocabulary proficiency). In this way, those with greater aptitude are expected to attain high-level L2 lexical proficiency after years of immersion as they can make the most of every practice opportunity in naturalistic settings (Doughty, Reference Doughty2019).

Domain-general auditory processing in L1 acquisition

In the field of cognitive psychology, one major theoretical debate concerns whether, to what degree, and how certain regions of the brain are specifically involved in human language acquisition (for an overview, see Campbell & Tyler, Reference Campbell and Tyler2018). One influential view states that the same perceptual-cognitive faculties govern a range of general purpose learning behaviors including language learning, and an example of such a domain-general capacity that has received much attention is auditory processing. This ability is collectively referred to as a set of basic, low-level perception skills to encode, represent, and remember frequency and time dimensions of sounds (e.g., pitch, formants, duration, and amplitude). Many scholars have argued that individual differences in such auditory perception skills play a key role in the speed, development, and delay in first-language (L1) acquisition (i.e., the auditory-deficit theory; Goswami, Reference Goswami2015; Tallal, Reference Tallal2004).

Auditory processing serves as “the gateway to spoken language” (Mueller et al., Reference Mueller, Friederici and Männel2012, p. 15953), as it anchors every stage of phonological, lexical, and morphosyntactic processing. In order to detect phonetic and phonological categories, it is necessary to encode the relative weights of multiple acoustic cues, such as formant height, shape, and length for vowels (Kuhl, Reference Kuhl2000) and approximants (Espy-Wilson et al., Reference Espy-Wilson, Boyce, Jackson, Narayanan and Alwan2000), pitch and voice onset time for stop consonants (Shultz et al., Reference Shultz, Francis and Llanos2012), and pitch height and contour for lexical tones (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010). More robust, prompt, and automatic phonetic and phonological analyses directly relate to the activation of contextually appropriate target words (Norris & McQueen, Reference Norris and McQueen2008), the detection of word and sentence boundaries (Cutler & Butterfield, Reference Cutler and Butterfield1992), and the refinement of morphological details (Joanisse & Seidenberg, Reference Joanisse and Seidenberg1998).

Among typical language development, auditory sensitivity continues to grow up to the age of 8 to 9 years, followed by a gradually declining curve through older adulthood (Skoe et al., Reference Skoe, Krizman, Anderson and Kraus2015). There is ample evidence that when toddlers experience difficulties at the level of basic lower-level auditory perception, their acquisition of phonetic, phonological, lexical, and morphosyntactic knowledge is slowed down, resulting in a range of global language problems (for a research synthesis, see Hämäläinen et al., Reference Hämäläinen, Salminen and Leppänen2013). For example, global language skills, such as reading and phonological awareness, are linked to the perception of nonverbal spectral (pitch and formats) and temporal (duration and amplitude) encoding (Foxton et al., Reference Foxton, Talcott, Witton, Brace, McIntyre and Griffiths2003; Grube et al., Reference Grube, Kumar, Cooper, Turton and Griffiths2012). Thus, there is much correlational evidence showing that dyslexic children are more likely to have auditory deficits (Casini et al., Reference Casini, Pech-Georgel and Ziegler2018; Goswami et al., Reference Goswami, Wang, Cruz, Fosker, Mead and Huss2011; Won et al., Reference Won, Tremblay, Clinard, Wright, Sagi and Svirsky2016). Some scholars have suggested auditory processing measures as a diagnostic tool for dyslexia (Hornickel & Kraus, Reference Hornickel and Kraus2013) and other language-related disorders (Russo et al., Reference Russo, Skoe, Trommer, Nicol, Zecker, Bradlow and Kraus2008).

As for normal-hearing children (i.e., children who have not been diagnosed with specific language impairment or dyslexia), there is ample research examining the relationship between individual differences in auditory processing and language skills (e.g., Anvari et al., Reference Anvari, Trainor, Woodside and Levy2002; Bavin et al., Reference Bavin, Grayden, Scott and Stefanakis2010; Boets et al., Reference Boets, Wouters, Van Wieringen, De Smedt and Ghesquiere2008; Douglas & Willatt, Reference Douglas and Willatts1994; Lamb & Gregory, Reference Lamb and Gregory1993; Talcott et al., Reference Talcott, Witton, McLean, Hansen, Rees, Green and Stein2000; Tierney et al., Reference Tierney, Gomez, Fedele and Kirkham2021). In essence, these studies have indicated (a) that children without hearing impairment nonetheless vary in auditory abilities and (b) that this variability is linked to a range of language outcomes (speech-in-noise perception, vocabulary use, literacy, and phonological awareness).

When it comes to normal-hearing adults, similar individual variation has been observed (e.g., Kidd et al., Reference Kidd, Watson and Gygi2007). However, the correlations between auditory processing and speech perception abilities appear to be unclear (e.g., but see Ahissar et al., Reference Ahissar, Protopapas, Reid and Merzenich2000; Surprenant & Watson, Reference Surprenant and Watson2001). One possible reason for this could be related to the various redundancies in speech perception. Every phonological contrast involves the complex integration of multiple acoustic signals. Due to the existence of multiple redundant cues, when listeners fail to perceive one, they may still accurately perceive the phoneme based on a different cue (e.g., the perception of English stops using voice onset time and/or pitch in following vowels; Toscano & McMurray, Reference Toscano and McMurray2010). As L1 speakers regularly engage in language-based interactions during which they receive input for prolonged periods of time, accumulating a great deal of relevant speech perception experience, even those with particular auditory deficits may identify/adopt unique cue weighting strategies to optimize speech recognition (e.g., Jasmin et al., Reference Jasmin, Dick, Holt and Tierney2019 for the case of amusics using duration rather than pitch cues for the normal perception of speech and music).

Domain-general auditory processing in L2 acquisition

More recently, some scholars (e.g., Saito, et al., Reference Saito, Sun and Tierney2020a) have begun to argue not only that auditory processing could explain some variance in adult L2 learners’ speech learning outcomes but also that it may play an even more influential role in L2 than L1 acquisition because of the quantitative and qualitative differences between L1 and L2 learning processes. In L1 acquisition, even infants with auditory perception deficits may overcome acquisition problems with extensive exposure to input for a long period of time (Rosen, Reference Rosen2003). Contrastingly, adult L2 learners typically have limited access to exposure to their target language, even under immersion conditions (Jia & Aaronson, Reference Jia and Aaronson2003). Unlike L1 learners, the lack of sufficient exposure opportunities may prevent L2 speakers from compensating for any perceptual deficit hindering their L2 comprehension development. In L2 learning contexts, any perceptual advantage or disadvantage can more strongly predict the extent to which L2 learners can benefit most from such limited input opportunities (Doughty, Reference Doughty2019).

Compared to L1 acquisition, in which auditory category learning takes place on a blank state (free of prior phonetic experience), it is important to note that adult L2 learners filter a new language input through their already-established auditory representations. In particular, they have to attend to new cues when L2 phonetic and phonological categories differ from L1. For example, Japanese speakers must learn to perceive difference in the third formant to acquire English [r] and [l] (Iverson et al., Reference Iverson, Kuhl, Akahane-Yamada, Diesch, Kettermann and Siebert2003), as well as to adjust and re-tune to existing analysis patterns when the cue weightings only partially overlap between L1 and L2 sounds (e.g., Chinese speakers need to deprioritize pitch and prioritize duration cues to acquire English word and sentence stress patterns; Jasmin et al., Reference Jasmin, Sun and Tierney2020). Developing or/and adjusting perceptual strategies to rely on new sources of input may draw on the ability to precisely and explicitly encode auditory dimensions, and so individual differences in auditory processing may demonstrate even greater predictive power for adult L2 speech learning success (Doughty, Reference Doughty2019).

With regard to testing the auditory-deficit hypothesis in L2 acquisition, there is a growing body of empirical research featuring a wide range of adult L2 speakers with diverse experience backgrounds. This work has found that individuals with more precise auditory processing abilities can hear and remember unfamiliar sounds and words more quickly when they are exposed to them (e.g., Kempe et al., Reference Kempe, Bublitz and Brooks2015; Wong & Perrachione, Reference Wong and Perrachione2007). In addition, individuals who demonstrate greater sensitivity to key acoustic information in a gradient manner (perceiving fine differences within sound categories) are more capable of integrating multiple cues to a single percept (e.g., Kim et al., Reference Kim, Clayards and Kong2020 for the weighting of vowel quality and quantity in the perception of synthesize vowel contrasts; Kong & Edwards, Reference Kong and Edwards2016 for the weighting of voice onset time and pitch in the perception of synthesized stop voicing contrasts; Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010 for the weighting of pitch direction and height in English speakers’ perception of Mandarin lexical tones).

When it comes to naturalistic L2 speech learning, it is individual variation in auditory processing that determines learning success even after the length and quality of experience is controlled for (Omote et al., Reference Omote, Jasmin and Tierney2017 for segmental and suprasegmental perception; Saito et al., Reference Saito, Sun and Tierney2020a for segmental and suprasegmental production). Interestingly, the relationship between auditory processing and acquisition tends to be stronger when L2 learners have engaged in a sufficient amount of immersion experience (e.g., > 1 year: Saito et al., Reference Saito, Sun, Kachlicka, Alayo, Nakata and Tierney2020), and when the analyses focus on the relatively difficult aspects of L2 speech learning (e.g., phonological accuracy rather than fluency; Saito et al., Reference Saito, Sun and Tierney2020a). In contrast, the predictive power of auditory processing may be smaller when learners lack opportunities to be exposed to extensive, interactive, and varied aural input (Saito et al., Reference Saito, Suzukida, Tran and Tierney2021 for classroom L2 learners).

Not surprisingly, all the aforementioned studies have exclusively concerned the relationship between auditory perception and L2 phonology, since the role of auditory input processing is most directly relevant to segmental and suprasegmental acquisition. If we take the theoretical stance that auditory processing is a bottleneck for various dimensions of L2 speech acquisition, the question now becomes the extent to which auditory processing influences the acquisition of higher-level linguistic competence beyond phonological refinements—that is, appropriate use of rich and varied vocabulary items. The current study is designed to address this issue.

Motivation for current study

There are several reasons to predict the presence of a relationship between more precise auditory processing and spoken L2 vocabulary development. Although the context of the topic is exclusively limited to English, there is some discussion to support the hypothesis that auditory processing drives the lexical, morphosyntactic, and global aspects of L2 learning.

At the lexical level, the detection of L2 lexical and sentence stress patterns is claimed to be fundamental to segmenting and making input available for word analyses (Field, Reference Field2005). Given that these linguistic phenomena are marked by changes in pitch, duration, and amplitude, it is reasonable to assume that individuals with greater sensitivity to the relevant acoustic dimensions can better encode, notice, and internalize novel or L2 prosodic patterns relative to L1 counterparts (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010). Relatedly, speech corpus research has shown that more frequent collocations are characterized by shorter word duration (Gregory et al., Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999) and that more frequent, predictable, or/and redundant words have shorter durations as well as reduced pitch and amplitude range (see also Bybee & Scheibman, 1996 for the relationship between collocational strength and vowel reduction; Shattuck-Hufnagel & Turk, Reference Shattuck-Hufnagel and Turk1996). Thus, we hypothesize that robust prosodic processing may help L2 learners infer from every instance of input not only which parts of speech could be chunked together but also whether they serve as frequent collocational units.

At the morphosyntactic level, it has been shown that the linguistic features with which L2 learners have the most difficulty tend to have fewer phonemes, and low syllabicity and sonority (e.g., Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001). According to the prosodic account of L2 grammar (Goad & White, Reference Goad and White2019), the accurate encoding of prosodic cues is believed to be a necessary condition for the acquisition of complex morphology (e.g., inflection), syntax (e.g., word order), and semantics (e.g., articles). Thus, we hypothesize that individual differences in pitch, duration, and amplitude rise time may determine the extent to which learners can extract L2 morphosyntactic information from aural input and that those with more precise prosodic processing can demonstrate more advanced L2 morphosyntactic proficiency (e.g., Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Saito et al., Reference Saito, Sun and Tierney2020a).

At the global level, whereas L2 vocabulary knowledge is instrumental to global reading and listening skills, it has been shown that those who have attained highly advanced L2 listening and reading proficiency are likely to have greater working memory, attentional control, and phonological awareness (Vafaee & Suzuki, Reference Vafaee and Suzuki2020; Wallace, Reference Wallace2020). Importantly, scholars have also demonstrated that auditory processing and cognitive abilities are interwoven with each other (Ahissar et al., Reference Ahissar, Lubin, Putter-Katz and Banai2006; Grube et al., Reference Grube, Kumar, Cooper, Turton and Griffiths2012; Snowling et al., Reference Snowling, Gooch, McArthur and Hulme2018). As such, auditory processing and memory abilities can simultaneously help learners hold aural information for a longer period of time, thereby making it available for more robust acoustic analyses (Zhang et al., Reference Zhang, Moore, Guiraud, Molloy, Yan and Amitay2016).

Given that auditory processing is an important determinant of phonological aspects of L2 speech learning (e.g., Omote et al., Reference Omote, Jasmin and Tierney2017), and that the mechanisms underlying the individual differences in L2 lexical production development and attainment have remained open to investigation (Saito, Reference Saito2015, Reference Saito2020), the current study explored the relationship between a total of 100 late Polish-English bilinguals’ profiles of auditory processing (pitch, duration, and amplitude rise time), biographical backgrounds (length of immersion, musical training, and age of arrival [AOA]), and spoken vocabulary proficiency (appropriateness and richness). The following research question, followed by predictions, was formulated:

  • Whether to what degree and how does auditory processing relate to postpubertal L2 learners’ spoken vocabulary proficiency when biographical factors are controlled for?

According to the cross-sectional and longitudinal investigations, much vocabulary learning can be observed in richness (e.g., Salsbury et al., Reference Salsbury, Crossley and McNamara2011), and an extensive period of regular and frequent L2 use may be needed to make a perceptible change in appropriateness (e.g., Saito, Reference Saito2019 for 10+ years of immersion). As stated in the aptitude-acquisition hypothesis (Doughty, Reference Doughty2019), it is in such relatively difficult aspects (i.e., appropriateness rather than richness) where aptitude (including auditory processing) may play a key role in determining the extent to which certain L2 learners can attain advanced L2 lexicogrammatical proficiency (rich and accurate). In particular, more precise auditory processing (pitch, duration, and amplitude in particular) may help L2 learners: (a) segment aural input into words with the accurate use of lexical stress; (b) detect, internalize, and use more frequent, strongly combined collocational chunks in a contextually appropriate manner; and (c) access perceptually non-salient morphological markers (fewer phonemes, low syllabicity, and sonority; e.g., Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001 for regular past tense).



The participants were 100 Polish residents in the UK whose pronunciation performance was assessed in the precursor project. The length of immersion varied widely from 0.1 to 19 years. The data collection (speech and auditory processing tests) was conducted with a researcher at a university in London. While the frequency of daily L2 use varied across different contexts (work, home, and family), according to individual interviews, the participants reported that their main language of communication, at work and/or at home, was L2 English. After experiencing 6–15 years of English-as-a-Foreign-Language education in Poland, the participants arrived in the UK after puberty (AOA > 17 years). None reported any hearing nor reading problems. All the biographical information is detailed in Table 1.

Table 1. Biographical backgrounds of 100 participants

In the precursor research (Saito et al., Reference Saito, Sun and Tierney2020a), the differential effects of age, experience, and auditory processing on the participants’ phonological accuracy and fluency were measured. The analyses and findings were based on short speech samples (30 s per participant) elicited using a semi-structured speaking task (i.e., picture description), which would fall short of the length threshold for spoken lexicogrammar analyses. With a view of conducting robust analyses of spoken lexicogrammar, scholars have suggested 100+ words as a minimum length requirement (Koizumi & In’nami, Reference Koizumi and In’nami2012). To elicit sufficiently long spontaneous speech samples, and to capture appropriate use of vocabulary and grammar, longer speech samples from the same participants were elicited using a different task, that is, oral interview.

Speaking task

Free speech tasks, such as the oral interview task in this study, have been widely used in L2 vocabulary research (e.g., Crossley et al., Reference Crossley, Salsbury and McNamara2015) and high-stakes speaking-ability tests (e.g., IELTS). First, participants were asked to talk about the following topic (i.e., What was the hardest and toughest challenge in your life?). After 1 min of planning time, they spoke for roughly 2 min. Finally, the researcher asked a few follow-up questions in response to the content of their speech (for the materials used in the study, see Supporting Information-A). Compared to the highly structured task used in the precursor project, wherein participants focused on describing already provided information (picture narratives), the format of the interview task could be considered less structured, encouraging participants to produce longer and more complex speech while talking about more informal, familiar, and personal topics with freedom (see Skehan, Reference Skehan1998).

To control for the effects of phonological quality on L2 analyses, all the recordings were transcribed and cleaned by removing filled pauses (e.g., “ah, eh, um”) and fixing obvious mispronunciation problems (e.g., life, pronounced as rife, would still be spelled as life). The duration of the transcripts widely varied (M = 503.1 words, Range = 106–1264 words). Four researchers initially transcribed the same five speech samples (out of the entire dataset, 100 speech samples) to compare their agreement. While their transcripts largely agreed with each other, they discussed any discrepancies and agreed on some transcription conventions (see Supporting Information-B). Afterward, the remaining 95 samples were divided between the 4 researchers, each of whom individually transcribed 20–25 samples. Whenever they encountered ambiguous situations, they consulted with each other to ensure that they had consistently followed the agreed conventions.

Analyses of appropriateness

To capture the multifaceted nature of appropriateness, three different approaches were adopted:

Holistic appropriateness

To account for the potentially different degrees of error gravity on global comprehension and communicative adequacy, scholars have emphasized the importance of expert raters’ holistic judgments (Foster & Wigglesworth, Reference Foster and Wigglesworth2016). Following the training procedure in Saito (Reference Saito2019), a total of five linguistically trained raters were recruited to assess semantic and morphosyntactic dimensions of appropriateness.


The raters included three native speakers of English (two from the UK and one from the USA) and two near-native speakers of English (one from Estonia and one from Germany). All of them received several years of linguistics training at universities in London and had a significant amount of experience in L2 speech analyses as they regularly participated in empirical research projects of this kind. They reported high levels of familiarity with vocabulary use in British English and foreign accented English in the UK. As reported below, all the raters demonstrated relatively high inter-rater agreement (see below).


The rating sessions took place individually under the supervision of a researcher. The raters first received definitions for the two different areas of appropriateness: (a) semantic and (b) morphosyntactic. For training scripts and onscreen labels, see Supporting Information-C. During the assessment, the samples were displayed on a computer screen in a randomized order using MATLAB software. For each token, the degree of appropriateness was assessed using a moving slider. If the slider was placed at the leftmost end of the continuum, labeled with a frowning face (indicating “non-targetlike”), the rating was recorded as 0. If the slider was placed at the rightmost end of the continuum, labeled with a smiley face (indicating “targetlike”), the rating was recorded as 1000. The scoring method was explicitly explained to the raters. None of them asked any questions. To avoid any confusion (as reported in some L2 assessment research using a numbered scale; (Isaacs & Thomson, Reference Isaacs and Thomson2013), no numerical values were displayed on the screen.

To ensure the raters’ understanding of the procedure, they evaluated three practice transcripts (not included in the main dataset) and explained/justified their decisions. For each response, the researcher gave feedback to ensure that the raters handled the three different categories without confusion. Finally, the raters moved onto the main dataset of 100 transcripts.


In our pilot run, the length of a session turned out to be a problem as some transcripts were long (> 1000 words). To reduce rater fatigue, all transcripts were equally cut down to 250 words, except for several samples that were already less than 250 words. Each session lasted for approximately 3 hr (including training and practice), and the raters took a short break (10 min) halfway through. A Cronbach’s alpha analysis revealed that the five raters demonstrated relatively strong agreement for semantic appropriateness (α = .81) and morphosyntactic appropriateness (α = .83). According to the post-rating questionnaire, the raters reported that they not only understood but also handled the three rubrics through the judgment sessions without confusion (M = 9 out of “1 = “very difficult”, 9 = “very easy and comfortable””). The five raters’ scores were averaged to generate two scores for each transcript, quantifying its semantic and morphosyntactic appropriateness.

Local appropriateness

Given that the task was designed to elicit the participants to use past tense while speaking, local morphosyntactic accuracy was operationalized via tallying the number of past tense errors by the number of obligatory contexts per sample (for a similar approach, see Kourtali & Révész, Reference Kourtali and Révész2020). The past tense in English was considered perceptually non-salient (Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001), was semantically redundant and less intrusive to communicative success (VanPatten, Reference VanPatten2002), and was reported to be difficult among many adult L2 learners of English (Ellis et al., Reference Ellis, Loewen and Erlam2006). Two linguistically trained coders first analyzed 20 (out of 100) samples. The inter-coder agreement was relatively high (r = .91). The first coder completed the rest of the analyses. The participants’ accuracy ratio widely ranged (M = 24.9%; SD = 18.3; Range = 0–92%). Since the obligatory context analysis could be influenced by text length, we also calculated residual accuracy scores with the length factor statistically controlled for.

Collocational appropriateness

Collocation is broadly defined as a meaningful combination of multiword expressions (Gablasova et al., Reference Gablasova, Brezina and McEnery2017) and found to serve as a primary determinant of humans’ intuitive of lexical appropriateness (Saito, Reference Saito2020). To this end, two corpus-based association measures were used, Mutual Information (MI) bigram and trigram. Conceptually, MI indicates the strength of the partnership between two- and three-word expressions, while controlling for the probability of random groupings of words. Collocations with higher MI scores consist of combinations of words which likely have a fewer number of partner words. These words likely exhibit greater coherence, more distinctive meaning, and clearer discourse functions. To calculate MI, random co-occurrences of words were first estimated by dividing the number of any possible combinations within a fixed window size (n = ± 5 words in TAALES) by the total number of tokens in the reference corpus (British National Corpus). Then, the frequency of collocations was divided by the frequency of random co-occurrence among the words and then logarithmized.

Analyses of richness

The multifaceted nature of richness was approached from three different perspectives via TAALES 2.0 (Kyle & Crossley, Reference Kyle and Crossley2015):


Word frequency refers to the extent to which less frequent and common words are used per sample. The index was calculated by dividing the total sum of frequency scores in reference to the British National Corpus by the number of all the words with frequency scores. In order to control for Zipfian effects in word frequency lists (higher-frequency words are more likely to be recycled), the raw scores were logarithmically transformed. Lower frequency scores indicate the use of less frequent words and more infrequent words, which is characteristic of more advanced L2 lexical proficiency (Crossley & McNamara, Reference Crossley and McNamara2009).


Word range refers to the extent to which L2 speakers used more specific words which are narrowly used and observed in certain contexts and genres (rather than across diverse contexts). The index was calculated by dividing the total sum of range scores by the number of words in the texts with range scores. Like frequency, the raw scores were logarithmically transformed. Words with lower range scores indicate the use of more context-specific words (restricted to certain genres), which is considered as an index of more advanced L2 lexical proficiency (Kyle & Crossley, Reference Kyle and Crossley2015).


Abstractness refers to the extent to which words that are less concrete, imageable, and familiar are used per sample. In TAALES, native speakers’ perceived judgments of concreteness and imageability were stored for 4,000 content words based on the MRC psycholinguistics database (Coltheart, Reference Coltheart1981). The average concreteness, imageability, and familiarity scores were calculated for each transcript (0–1000 points). Thus, those who often use words with lower judgment scores (i.e., less concrete, imageable, and familiar words) could be considered to have more advanced L2 lexical proficiency (Salsbury et al., Reference Salsbury, Crossley and McNamara2011).

Auditory processing measures

Following the methodology widely used in cognitive psychology (e.g., Surprenant & Watson, Reference Surprenant and Watson2001), participants’ domain-general perception ability was assessed using a battery of psychophysical assessments. The materials used for the current study were developed in precursor studies (e.g., Kachlicka et al., Reference Kachlicka, Saito and Tierney2019). As reviewed earlier, the acoustic dimensions relevant to L2 vocabulary acquisition were assumed to involve participants’ thresholds for discrimination of pitch, duration, and amplitude. For each test, three complex tone stimuli were presented, with either the first or the third sounding different from the other two. Participants indicated which sound was different by either pressing the number “1” or “3” on a keyboard. An adaptive three-alternative forced-choice procedure was used, such that the difficulty of the task would decrease after every incorrect response and increase after every third correct response. The program continued until eight reversals had been reached, that is, incorrect answers after a string of successes or correct answers after a string of failures.


For each test, 100 continuous synthesized stimuli (500 ms in length) were created via custom MATLAB scripts. They differed at 100 steps along the target acoustic dimension (Levels 1–100). A total of 100 four-harmonic complex tones were created with F0 set to 330 Hz and the amplitude of each harmonic set to 40 dB. The target acoustic dimension for each test varied by a step of 0.3 Hz in F0 (330.3–360 Hz), 2.5 ms in duration (252.5–500 ms), and 1.22 ms in amplitude rise time (178–300 ms), respectively.


When three different tones were presented with an inter-stimulus interval of 0.5 s, the participants were asked to choose which of the three tones differed from the other two by pressing the number “1” or “3.” Based on (Levitt, Reference Levitt1971) adaptive threshold procedure, the level of difficulty changed from trial to trial according to participants’ performance. The initial difficulty, that is the level of the target stimulus, was set to level 50. When three correct responses were made in a row, the difference became smaller by a degree of 10 steps (more difficult). When their response was incorrect, the difference became wider by a degree of 10 steps (easier).

The reverse happened when the direction of difficulty between trials reversed—that is, when an increase in acoustic difference (easier) was followed by a decrease (more difficult), or vice versa. After the first reversal, the step size decreased (more difficult) from 10 to 5, and then from 5 to 1 after the second reversal. The tests stopped either after 70 trials or 8 reversals. For participants’ auditory processing scores, the stimulus levels after the third reversal were averaged. Since the scores indicate how small of a difference participants can perceive, lower scores indicate more precise auditory processing scores.


To check the reliability of the auditory processing tests, a follow-up project was conducted with 30 English users with diverse experience and proficiency levels (not included in the current study). They took a range of auditory processing tests (including pitch, duration, and amplitude discrimination) twice with an interval of 1 day. The results of Spearman’s correlation analyses demonstrated small-to-medium strength for the individual tests (r = .632 for pitch, .333 for duration, and .737 for rise time). As for the composite auditory processing scores (averaging pitch, duration, and rise time discrimination), the reliability (r = .720) could be considered satisfactory and comparable to similar research (e.g., r = 0.75 in Raz et al., Reference Raz, Willerman and Yama1987). The results suggest that although using individual test scores may result in low reliability (e.g., duration discrimination), composite test scores may serve as a more reliable proxy of one’s auditory precision (for methodological details, see Brief Report in Saito et al., Reference Saito, Sun and Tierney2020b).

Composite scores

The descriptive results of pitch, duration, and amplitude rise time discrimination test scores were summarized in Supporting Information-D. Since the data significantly differed from the normal distribution (p < .01), their raw scores were transformed via a log10 function. To calculate participants’ overall prosodic encoding abilities, their raw scores were standardized and averaged. According to the results of the normality test (Kolmogorov–Smirnov), the resulting averaged scores were comparable to the normal distribution (p > .05) and thus were used for the subsequent analyses as a composite index of participants’ auditory processing of prosodic cues. Lower factor scores indicate more precise encoding of pitch, duration, and amplitude information.

For the sound stimuli used in the auditory processing tests (duration, pitch, and rise time), see the team’s website (, under construction).


First, we present the results of preliminary analyses to examine what characterizes spoken L2 vocabulary proficiency among 100 Polish-English bilinguals relative to L1 counterparts. Second, we show the results of factor analyses to explore what underlies spoken L2 vocabulary proficiency (which we analyzed via 11 outcome measures) and auditory processing abilities (which we analyzed via 3 outcome measures). Subsequently, we present the results of multiple regression analyses to probe how a range of predictor variables related to experience and auditory processing are uniquely associated with various dimensions of participants’ L2 lexical proficiency.

Spoken L2 versus L1 vocabulary proficiency

The descriptive results of the 11 vocabulary measures are summarized in Table 2. To examine what characterizes spoken L2 vocabulary proficiency, a set of 95% confidence interval analyses were performed to check the extent to which Polish-English bilinguals’ performance overlapped with (or deviated from) that of L1 speakers. In the prior project (Saito, forthcoming), a total of 10 monolingual speakers of English (born and raised in the English-speaking areas of Canada) completed the same oral interview task. The results indicated two overall patterns: (a) some Polish-English bilinguals reached nativelike proficiency in terms of richness (overlaps in 95% intervals in all measures) and (b) appropriateness could be considered as a relatively difficult dimension of spoken L2 vocabulary proficiency as L2 speakers’ proficiency was significantly distinguishable from the native benchmark in five of six measures (i.e., lexical, morphosyntactic, and collocational accuracy).

Table 2. Descriptive summary of spoken L2 vocabulary proficiency relative to native benchmark

Note. aThe native control data derives from Saito (forthcoming).

Constructs of spoken L2 vocabulary proficiency

To check whether and to what degree they were assumed to tap into the constructs that we intended to measure (n = 6 for appropriateness and n = 5 for richness), they were submitted to an exploratory factor analysis with Varimax rotation. The factorability of the entire dataset was considered adequate according to Bartlett’s test of sphericity (χ 2 = 1506.938, p < .001) and the Kaiser–Meyer–Olkin measure of sampling adequacy (.659). Using the standard of an eigenvalue beyond 1.0, a five-factor solution was suggested, accounting for 90.713% of the variance in the outcomes of the auditory processing measures.

In terms of factor loadings, 0.6 was used as a cutoff point in line with Hair et al. (Reference Hair, Black, Babin, Anderson and Tatham1998) recommendation for factor analyses of relatively small sample size (n < 100). In light of the grouping patterns in Table 3, Factor 1 was labeled as “holistic accuracy” as it clustered both of the appropriateness judgment scores, Factor 2 was labeled as “breadth” as it corresponded the use of infrequent, context-specific, and unfamiliar words on a surface level, Factor 3 was labeled as “local accuracy,” Factor 4 was labeled as “abstractness” as it clustered the MRC psycholinguistics database of word concreteness and imageability, and Factor 5 was labeled as “collocational accuracy” as it included both corpus-based n-gram measures. According to the results of the Kolmogorov–Smirnov tests, the distribution of the resulting factor scores was not significantly different from the normal distribution (p > .05), and thus the scores were used for the subsequent analyses without transformation.

Table 3. Summary of a five-factor solution based on a factor analysis of spoken L2 lexicogrammar proficiency

Note. aThe direction of the factor scores was reversed to proxy what the original scores indicate (more accurate and more collocational).

Roles of experience and auditory processing in spoken L2 vocabulary

To examine the relative weights of the biographical and auditory processing factors in the outcomes of spoken L2 vocabulary proficiency, a set of stepwise multiple regression analyses were conducted on each proficiency dimension as per a set of predictors related to auditory processing and experience. To avoid multicollinearity problems, the composite auditory processing scores (pitch, duration, and rise time discrimination) were used as a global index of auditory processing. Four experience factors were included as they were extensively discussed in the existing literature as crucial affecting factors L2 speech acquisition (Flege, Reference Flege2018 for AOA; Saito, Reference Saito2015 for LOR; Flege & Liu, Reference Flege and Liu2001 for Current L2 Use; Muñoz, Reference Muñoz2014 for Length of EFL).Footnote 1 The mechanisms underlying L2 speech learning are said to differ between the early and later phase of immersion (DeKeyser, Reference DeKeyser2013). To this end, five interaction terms were included to see whether and to what degree the five predictors differentially related to the L2 vocabulary proficiency among two different groups of L2 learners; dummy codes (1 and 2) were given to interlanguage learner group (n = 50; LOR = 0.1–5 years) and ultimate attainer group (n = 50; LOR = 6+ years). Finally, given that the length of participants’ speech widely varied (106–1264 words), this variable was also entered as a covariate. For each of the vocabulary proficiency dimensions (holistic accuracy, local accuracy, collocational accuracy, breadth, and abstractness), the following model was constructed:

  • Vocabulary Proficiency = Auditory Processing + Age of Arrival + Length of Residence + Current L2 Use + Length of EFL + Length of Speech + Auditory Processing × Group + Age of Arrival × Group + Length of Residence × Group + Current L2 Use × Group + EFL × Group

Model selection was conducted via SPSS based on the results of F tests. Backward elimination was chosen. After all the independent variables were entered, the largest probability of F was removed at each step (using p = .10 as a benchmark). The selection was completed when no variables were eligible for elimination. The details of the model building processes for each vocabulary domain were found in Supporting Information-E.

The final models were summarized in Table 4. The results generally showed that L2 accuracy was primarily predicted by auditory processing factors (composite prosodic processing scores) and secondarily by biographical factors (LOR and AOA). More specifically, the link between auditory processing and acquisition was weak in local accuracy (related to the use of past tense) relative to holistic and collocational accuracy (related to vocabulary use in general). The roles of biographical factors uniquely related to different types of accuracy. Holistic accuracy was tied to LOR, and local accuracy was associated with AOA. Interestingly, all the interaction effects were excluded in the final models in all instances, suggesting that the findings were generalizable across different stages of L2 acquisition (LOR = 0.1 to 40 years). None of the models of the richness factors (breadth and abstractness) reached statistical significance (p > .05). No clear sign of multicollinearity was found in any contexts (variance inflation factor < 1.231).Footnote 2

Table 4. Summary of stepwise multiple regression models featuring only significant predictors of spoken L2 vocabulary proficiency

Note. aLower scores indicate lower error ratio (more accurate use of past tense).


Drawing on the auditory deficit theory in L1 acquisition, there is an emerging hypothesis that individual differences in experience, auditory processing, and L2 acquisition are interwoven (Mueller et al., Reference Mueller, Friederici and Männel2012). According to the precursor research, auditory processing is an important determinant of segmental and suprasegmental accuracy (rather than fluency) aspects of L2 speech, even when all the biographical factors (age, immersion experience, and music training) are controlled for (Kachlicka et al., Reference Kachlicka, Saito and Tierney2019; Saito et al. Reference Saito, Sun and Tierney2020a). To further scrutinize the generalizability of the topic to higher-order dimensions of postpubertal L2 speech acquisition, we aimed to examine the effects of auditory processing in spoken L2 vocabulary development and attainment among a total of 100 late Polish-English bilinguals in the UK.

According to the results of the statistical analyses, L2 learners who attained more advanced L2 vocabulary proficiency had not only more relevant experience (extensive immersion and earlier AOA), but also more precise auditory processing ability. As predicted earlier, our findings here generally align with the view that one’s ability to track individual dimensions of prosodic information (i.e., pitch, duration, and amplitude) serves as a key driving force for detecting lexical and syntactic boundaries (De Pijper & Sanderman, Reference De Pijper and Sanderman1994). Thus, it is possible that with more precise prosodic processing abilities, learners can better represent, encode, and segment ambient input into lexical and syntactic units, resulting in the development of more robust phonological and morphosyntactic knowledge (Jiang, Reference Jiang2000; Best & Tyler, Reference Best and Tyler2007). Additionally, more precise auditory processing abilities are linked to greater phonological awareness and executive function, which in turn facilitates L2 reading and listening complementarily (Linck et al., Reference Linck, Hughes, Campbell, Silbert, Tare, Jackson and Doughty2013). Finally, those with more precise sound timing may detect more closely and frequently used multiword units, as they are delivered faster than other less common and less predictable combinations of words (Gregory et al., Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999).

Importantly, auditory processing could be fundamental, especially concerning appropriateness rather than breadth and abstractness dimensions of L2 lexicogrammar development. This is arguably because the former dimensions (appropriateness) are claimed to be more difficult than the latter dimensions (breadth and abstractness). As shown in the current study (the results of the benchmark analyses), there was considerably larger distance between L1 and L2 speakers in appropriateness than breadth and abstractness. The development of accuracy has been found to takes place over a great deal of immersion experience at lexical (Saito, Reference Saito2019) and morphosyntactic levels (Bartning et al., Reference Bartning, Lundell and Hancock2012). According to the aptitude-acquisition view, it is the relatively difficult L2 learning aspects that are subject to a substantial amount of L2 experience and susceptible to the effects of individual differences in aptitude (Doughty, Reference Doughty2019). In terms of the breadth and abstractness aspects of spoken L2 vocabulary proficiency, the participants were comparable with each other regardless of experience and auditory profiles. This is arguably because many L2 learners’ vocabulary use could be sufficiently abstract even without much immersive experience (see Saito, Reference Saito2019; Salsbury et al., Reference Salsbury, Crossley and McNamara2011).

While the facilitative role of auditory processing is germane to higher-level linguistic skills to some degree, such as the production of L2 vocabulary, it is also important to remember that the outcomes of spoken L2 vocabulary development are moderately related to auditory processing. In fact, the strength of the audition-acquisition link could be considered small (e.g., r = −.346 for holistic accuracy in Table 4). In prior research, the predictive power of auditory perception appeared to be more clearly observed in lower-order linguistic skills which directly involve auditory information, such as segmental and suprasegmental perception (e.g., Kachlicka et al. for r = −.6) and production (e.g., Saito et al., Reference Saito, Sun and Tierney2020a for r = −.4 to −.5). Therefore, it would be intriguing to further examine whether and to what degree other cognitive measures may explain the remaining variance in spoken L2 vocabulary acquisition. Such potential predictors include working memory (Martin & Ellis, Reference Martin and Ellis2012), selective attention (Nicolay & Poncelet, Reference Nicolay and Poncelet2013), and foreign language aptitude (Li, Reference Li2016).


Given that the current study took an exploratory approach to delving into the role of auditory processing in spoken L2 vocabulary development, there are several methodological limitations that future studies should further remedy and expand. First, all the findings were based on the cross-sectional analyses of 100 late Polish-English bilinguals. To further examine the causal relationship between auditory processing, experience, and L2 speech learning, it is necessary to conduct a longitudinal investigation. For example, future studies should explore the variance in phonological and lexical aspects of L2 proficiency in participants with various auditory processing profiles over a certain period of training (Chandrasekaran et al., Reference Chandrasekaran, Sampath and Wong2010) and immersion (Sun et al., Reference Sun, Saito and Tierney2021).

Secondly, participants’ auditory processing was analyzed via the psychoacoustic tests. However, it has been argued that the test format (A×B discrimination) may not only reflect participants’ auditory precision but also involve a range of cognitive abilities, such as attentional control (Snowling et al., Reference Snowling, Gooch, McArthur and Hulme2018). To control for the separate effects of perceptual and cognitive individual differences, future studies should adopt both auditory processing and executive function tests (cf. Saito et al., forthcoming for the relationship between memory, auditory processing, and L2 speech learning).

Thirdly, whereas participants’ spoken vocabulary proficiency was elicited from a single-task condition (oral interview), it has been shown that L2 learners’ speech performance is susceptible to change as per task conditions (see Ellis, Reference Ellis2009 for an overview on task effects on appropriateness, richness, and fluency). The findings of the current investigation need to be replicated using multiple tasks differing in terms of the timing and length of planning time (Ahmadian & Tavakoli, Reference Ahmadian and Tavakoli2011), the degree of structural complexity (Foster & Tavakoli, Reference Foster and Tavakoli2009), and conceptualization (Saito, forthcoming).

Fourthly, the generalizability of the findings (i.e., prosodic processing vs. spoken L2 vocabulary) needs to be tested for diverse L1–L2 pairings. Although we argued that prosodic acuity matters for L2 vocabulary acquisition due to its relevance to word segmentation, it is important to note that the relative weights of prosodic cues may be highly language-specific. For example, it would be interesting to replicate the findings in L2 French speakers who use stress to parse linguistic units at sentence but not word level (e.g., Dupoux et al., Reference Dupoux, Pallier, Sebastian and Mehler1997 for the cross-linguistic differences in word and sentence stress assignment and its impact on tone deafness)

Finally, whereas the current study indicated a potential link between auditory processing and the acquisition of L2 English past tense, it needs to be acknowledged that little is known about how auditory processing is related to L2 morphosyntax at a fine-grained level. In the field of second-language acquisition, a growing amount of attention has been directed toward detangling how phonology interfaces with various areas of grammar (for a comprehensive summary of the prosodic account of L2 behaviors, see Goad & White, Reference Goad and White2019). Given that Goldschneider and DeKeyser (Reference Goldschneider and DeKeyser2001) presented a plausible hierarchical framework for the perceptual acuity and morphosyntactic learning, one promising enquiry concerns the extent to which L2 learners with different levels of auditory processing abilities master L2 morphosyntax with different levels of perceptual salience (e.g., sonority). There is a possibility that individual differences in auditory processing (a core component of phonology) may be integral to the acquisition of grammar which interfaces lexicon, morphology, and syntax (e.g., inflection; Austin et al., Reference Austin, Chang, Kim and Daly2021) and semantics and discourse (e.g., articles; Demuth & McCullough, Reference Demuth and McCullough2009).


All in all, our findings concur with the mounting empirical evidence that auditory processing is a determinant of how much L2 learners can benefit from immersion experience, resulting in more advanced outcomes (Saito et al., Reference Saito, Sun and Tierney2020a), and the theoretical view that the same driving faculty of L1 acquisition (i.e., auditory processing) is tied to every stage of L2 acquisition throughout an individual’s lifetime (Flege, Reference Flege2018). Building on the prior work (e.g., Saito et al., Reference Saito, Sun and Tierney2020a for segmental and suprasegmental production), we add that such audition effects are more clearly observed not only in the acquisition of relatively difficult features (accuracy rather than fluency, breadth, and abstractness), but also in the dimensions more closely related to the speech signal (phonology rather than lexicogrammar). Interestingly, hearing research has shown that auditory deficits can be remedied via focused training (e.g., Carcagno & Plack, Reference Carcagno and Plack2011 for 10 hr of pitch discrimination training). In light of the significant relationship between auditory processing and L2 speech learning (though its strength varies across different linguistic dimensions), our study hints at the possibility that auditory training may help L2 learners amplify and optimize their acquisition processes, if it is provided at the same time that they engage in a certain period of immersive experience in a target language-speaking country (e.g., study abroad), or when they receive intensive or/and meaning-oriented speech training (e.g., Barriuso & Hayes-Harb, Reference Barriuso and Hayes-Harb2018 for high-variability phonetic training; Lee & Lyster, Reference Lee and Lyster2016 for focus on form; Lim & Holt, Reference Lim and Holt2011 for incidental video-gaming; Mora & Levkina, Reference Mora and Levkina2017 for task-based pronunciation teaching; Shao et al., Reference Shao, Saito and Tierney2022 for repetition-based training)—that is, a new interdisciplinary direction that linguistics, psychology, education, and hearing researchers can further pursue together.


This study was funded by Leverhulme Trust Research Grant (RPG-2019-039), Spencer Foundation Grant (202100074), and ESRC Connection Grant (ES/S013024/1). We gratefully acknowledge insightful comments from anonymous Applied Psycholinguistics reviewers on earlier versions of the manuscript.

Ethical standards

This project obtained an ethical approval from University of London.


1. Six participants did not report the length of foreign language education prior to their arrival in the UK. Their missing values were replaced with average (i.e., 9.5 years). Not surprisingly, neither of the age-related variables (Chronological Age, Age of EFL) was significantly correlated with any aspects of L2 vocabulary proficiency attainment (p > .05). This corresponds to the existing research evidence that what matters for L2 speech acquisition is age of arrival rather than chronological age (e.g., Flege, Reference Flege2018), and the length of EFL rather than age of learning (e.g., Muñoz, Reference Muñoz2014).

2. The low variance inflation factor (< 1.231) suggest that participants’ biographical backgrounds and auditory processing abilities were relatively independent with each other at least within the current dataset. In cognitive psychology, it has been shown that auditory processing is susceptible to change in relation to chronological age (Skoe et al., Reference Skoe, Krizman, Anderson and Kraus2015), music training (Zendel & Alain, Reference Zendel and Alain2012 for musicians vs. non-musicians), tonality in first-language status (Bidelman et al., 2011 for tonal vs. non-tonal language users), and bilingual experience (Krizman et al., Reference Krizman, Slater, Skoe, Marian and Kraus2015 for simultaneous vs. sequential bilinguals). However, little is known about the biographical correlates of auditory processing among adult second-language learners (cf. Reference Saito, Cui, Suzukida, Dardon, Suzuki, Jeong, Revesz and SugiuraSaito et al., in press).


Abrahamsson, N., & Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning, 59(2), 249306. CrossRefGoogle Scholar
Ahissar, M., Lubin, Y., Putter-Katz, H., & Banai, K. (2006). Dyslexia and the failure to form a perceptual anchor. Nature Neuroscience, 9, 15581564.CrossRefGoogle Scholar
Ahissar, M., Protopapas, A., Reid, M., & Merzenich, M. M. (2000). Auditory processing parallels reading abilities in adults. Proceedings of the National Academy of Sciences, 97, 68326837.CrossRefGoogle ScholarPubMed
Ahmadian, M. J. & Tavakoli, M. (2011). The effects of simultaneous use of careful online planning and task repetition on accuracy, complexity, and fluency in EFL learners’ oral production. Language Teaching Research, 15(1), 3559.CrossRefGoogle Scholar
Anvari, S. H., Trainor, L. J., Woodside, J., & Levy, B. A. (2002). Relations among musical skills, phonological processing, and early reading ability in preschool children. Journal of Experimental Child Psychology, 83, 111130.CrossRefGoogle ScholarPubMed
Austin, G., Chang, H., Kim, N., & Daly, E. (2021). Prosodic transfer across constructions and domains in L2 inflectional morphology. Linguistic Approaches to Bilingualism.CrossRefGoogle Scholar
Barriuso, T. A., & Hayes-Harb, R. (2018). High variability phonetic training as a bridge from research to practice. CATESOL Journal, 30, 177194.Google Scholar
Bartning, I., Lundell, F. F., & Hancock, V. (2012). On the role of linguistic contextual factors for morphosyntactic stabilization in high-level L2 French. Studies in Second Language Acquisition, 34, 243267.CrossRefGoogle Scholar
Bavin, E. L., Grayden, D. B., Scott, K., & Stefanakis, T. (2010). Testing auditory processing skills and their associations with language in 4–5-year-olds. Language and Speech, 53, 3147.CrossRefGoogle ScholarPubMed
Best, C. T., & Tyler, M. (2007). Nonnative and second-language speech perception. In O.-S. Bohn & M. J. Munro (Eds.), Language experience in second language speech learning:In Honour of James Emil Flege (pp. 1334). John Benjamins Publishing.CrossRefGoogle Scholar
Boets, B., Wouters, J., Van Wieringen, A., De Smedt, B., & Ghesquiere, P. (2008). Modelling relations between sensory processing, speech perception, orthographic and phonological ability, and literacy achievement. Brain and Language, 106, 2940.CrossRefGoogle ScholarPubMed
Bybee, J., & Scheibman, J. (1999). The effect of usage on degree of constituency: The reduction of don’t in American English. Linguistics, 37, 575596.CrossRefGoogle Scholar
Campbell, K. L., & Tyler, L. K. (2018). Language-Related domain-specific and domain-general systems in the human brain. Current Opinion in Behavioral Sciences, 21, 132137. CrossRefGoogle ScholarPubMed
Carcagno, S., & Plack, C. J. (2011). Subcortical plasticity following perceptual learning in a pitch discrimination task. Journal of the Association for Research in Otolaryngology, 12, 89100.CrossRefGoogle Scholar
Casini, L., Pech-Georgel, C., & Ziegler, J. C. (2018). It’s about time: Revisiting temporal processing deficits in dyslexia. Developmental Science, 21, e12530.CrossRefGoogle ScholarPubMed
Chandrasekaran, B., Sampath, P. D., & Wong, P. C. (2010). Individual variability in cue-weighting and lexical tone learning. The Journal of the Acoustical Society of America, 128, 456465.CrossRefGoogle ScholarPubMed
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497505.CrossRefGoogle Scholar
Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18(2), 119135.CrossRefGoogle Scholar
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2015). Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570590.Google Scholar
Crossley, S. A., Skalicky, S., Kyle, K., & Monteiro, K. (2019). Absolute frequency effects in second language lexical acquisition. Studies in Second Language Acquisition, 41(4), 721744.CrossRefGoogle Scholar
Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31(2), 218236. CrossRefGoogle Scholar
De Pijper, J. R., & Sanderman, A. A. (1994). On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. The Journal of the Acoustical Society of America, 96(4), 20372047.CrossRefGoogle Scholar
DeKeyser, R. M. (2013). Age effects in second language learning: Stepping stones toward better understanding. Language Learning, 63, 5267.CrossRefGoogle Scholar
Demuth, K., & McCullough, E. (2009). The prosodic (re)organization of childrenʼs early English articles. Journal of Child Language, 36(1), 173200.CrossRefGoogle Scholar
Derwing, T. M., & Munro, M. J. (2013). The development of L2 oral language skills in two L1 groups: A 7-year study. Language Learning, 63(2), 163185.CrossRefGoogle Scholar
Doughty, C. J. (2019). Cognitive language aptitude. Language Learning, 69, 101126.CrossRefGoogle Scholar
Douglas, S., & Willatts, P. (1994). The relationship between musical ability and literacy skills. Journal of Research in Reading, 17, 99107.CrossRefGoogle Scholar
Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing “deafness” in French? Journal of Memory and Language, 36(3), 406421.CrossRefGoogle Scholar
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24, 143188.CrossRefGoogle Scholar
Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity, and accuracy in L2 oral production. Applied Linguistics, 30, 474509.CrossRefGoogle Scholar
Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and the acquisition of L2 grammar. Studies in Second Language Acquisition, 28(2), 339368.CrossRefGoogle Scholar
Espy-Wilson, C. Y., Boyce, S. E., Jackson, M., Narayanan, S., & Alwan, A. (2000). Acoustic modeling of American English/r. The Journal of the Acoustical Society of America, 108, 343356.CrossRefGoogle ScholarPubMed
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39, 399423.CrossRefGoogle Scholar
Flege, J., & Liu, S. (2001). The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition, 23, 527552.CrossRefGoogle Scholar
Flege, J. E. (2018). It’s input that matters most, not age. Bilingualism: Language and Cognition, 21(5), 919920.CrossRefGoogle Scholar
Foster, P., & Tavakoli, P. (2009). Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity. Language Learning, 59, 866896.CrossRefGoogle Scholar
Foster, P., & Wigglesworth, G. (2016). Capturing accuracy in second language performance: The case for a weighted clause ratio. Annual Review of Applied Linguistics, 36, 98116.CrossRefGoogle Scholar
Foxton, J. M., Talcott, J. B., Witton, C., Brace, H., McIntyre, F., & Griffiths, T. D. (2003). Reading skills are related to global, but not local, acoustic pattern perception. Nature Neuroscience, 6, 343344.CrossRefGoogle Scholar
Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67, 155179.CrossRefGoogle Scholar
Goad, H., & White, L. (2019). Prosodic effects on L2 grammars. Linguistic Approaches to Bilingualism, 9(6), 769808.CrossRefGoogle Scholar
Goldschneider, J. M., & DeKeyser, R. M. (2001). Explaining the “natural order of L2 morpheme acquisition” in English: A meta-analysis of multiple determinants. Language Learning, 51(1), 150.CrossRefGoogle Scholar
Goswami, U. (2015). Sensory theories of developmental dyslexia: Three challenges for research. Nature Reviews Neuroscience, 16, 4354.CrossRefGoogle ScholarPubMed
Goswami, U., Wang, H. L. S., Cruz, A., Fosker, T., Mead, N., & Huss, M. (2011). Language-universal sensory deficits in developmental dyslexia: English, Spanish, and Chinese. Journal of Cognitive Neuroscience, 23, 325337.CrossRefGoogle Scholar
Gregory, M. L., Raymond, W. D., Bell, A., Fosler-Lussier, E., & Jurafsky, D. (1999). The effects of collocational strength and contextual predictability in lexical production. In Chicago Linguistic Society, 35, 151166.Google Scholar
Grube, M., Kumar, S., Cooper, F. E., Turton, S., & Griffiths, T. D. (2012). Auditory sequence analysis and phonological skill. Proceedings of the Royal Society B: Biological Sciences, 279, 44964504.CrossRefGoogle ScholarPubMed
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (1998). Multivariate data analysis. Prentice Hall.Google Scholar
Hämäläinen, J. A., Salminen, H. K., & Leppänen, P. H. (2013). Basic auditory processing deficits in dyslexia: systematic review of the behavioral and event-related potential/field evidence. Journal of learning disabilities, 46(5), 413427.CrossRefGoogle ScholarPubMed
Hellman, A. B. (2011). Vocabulary size and depth of word knowledge in adult‐onset second language acquisition. International Journal of Applied Linguistics, 21(2), 162182. CrossRefGoogle Scholar
Hopp, H., & Schmid, M. S. (2013). Perceived foreign accent in first language attrition and second language acquisition: The impact of age of acquisition and bilingualism. Applied Psycholinguistics, 34(2), 361394.CrossRefGoogle Scholar
Hornickel, J., & Kraus, N. (2013). Unstable representation of sound: A biological marker of dyslexia. Journal of Neuroscience, 33, 35003504.CrossRefGoogle ScholarPubMed
Hyltenstam, K. (1988). Lexical characteristics of near-native second-language learners of Swedish. Journal of Multilingual & Multicultural Development, 9, 6784.CrossRefGoogle Scholar
Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135159.CrossRefGoogle Scholar
Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Kettermann, A., & Siebert, C. (2003). A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition, 87(1), B47B57.CrossRefGoogle ScholarPubMed
Jasmin, K., Dick, F., Holt, L., & Tierney, A. T. (2019). Tailored perception: Individuals’ speech and music perception strategies fit their perceptual abilities. Journal of Experimental Psychology: General, 149(5), 914934.CrossRefGoogle ScholarPubMed
Jasmin, K., Dick, F., Holt, L. L., & Tierney, A. (2020). Tailored perception: Individuals’ speech and music perception strategies fit their perceptual abilities. Journal of Experimental Psychology: General, 149, 914.CrossRefGoogle ScholarPubMed
Jasmin, K., Sun, H., & Tierney, A. T. (2020). Effects of language experience on domain-general perceptual strategies. Cognition, 206(104481), 114.Google ScholarPubMed
Jia, G., & Aaronson, D. (2003). A longitudinal study of Chinese children and adolescents learning English in the United States. Applied Psycholinguistics, 24(1), 131161.CrossRefGoogle Scholar
Jiang, N. (2000). Lexical representation and development in a second language. Applied Linguistics, 21, 4777.CrossRefGoogle Scholar
Joanisse, M. F., & Seidenberg, M. S. (1998). Specific language impairment: A deficit in grammar or processing? Trends in Cognitive Sciences, 2, 240247.CrossRefGoogle ScholarPubMed
Kachlicka, M., Saito, K., & Tierney, A. (2019). Successful second language learning is tied to robust domain-general auditory processing and stable neural representation of sound. Brain and Language, 192, 1524.CrossRefGoogle ScholarPubMed
Kempe, V., Bublitz, D., & Brooks, P. J. (2015). Musical ability and non‐native speech‐sound processing are linked through sensitivity to pitch and spectral information. British Journal of Psychology, 106(2), 349366.CrossRefGoogle Scholar
Kidd, G. R., Watson, C. S. & Gygi, B. (2007). Individual differences in auditory abilities. The Journal of the Acoustical Society of America, 122(1), 418435. CrossRefGoogle ScholarPubMed
Kim, D., Clayards, M., & Kong, E. J. (2020). Individual differences in perceptual adaptation to unfamiliar phonetic categories. Journal of Phonetics, 81, 100984.CrossRefGoogle Scholar
Koizumi, R. (2012). Relationships between text length and lexical diversity measures: Can we use short texts of less than 100 tokens? Vocabulary Learning and Instruction, 1(1), 6070.CrossRefGoogle Scholar
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554564.CrossRefGoogle Scholar
Kong, E. J., & Edwards, J. (2016). Individual differences in categorical perception of speech: Cue weighting and executive function. Journal of Phonetics, 59, 4057.CrossRefGoogle ScholarPubMed
Kourtali, N. E., & Révész, A. (2020). The roles of recasts, task complexity, and aptitude in child second language development. Language Learning, 70(1), 179218.CrossRefGoogle Scholar
Krizman, J., Slater, J., Skoe, E., Marian, V., & Kraus, N. (2015). Neural processing of speech in children is influenced by extent of bilingual experience. Neuroscience Letters, 585, 4853.CrossRefGoogle ScholarPubMed
Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National Academy of Sciences, 97, 1185011857.CrossRefGoogle ScholarPubMed
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757786.CrossRefGoogle Scholar
Lamb, S. J., & Gregory, A. H. (1993). The relationship between music and reading in beginning readers. Educational Psychology, 13, 1927.CrossRefGoogle Scholar
Lee, A. H., & Lyster, R. (2016). The effects of corrective feedback on instructed L2 speech perception. Studies in Second Language Acquisition, 38, 35.CrossRefGoogle Scholar
Leppänen, P. H. T., Hämäläinen, J. A., Guttorm, T. K., Eklund, K. M., Salminen, H., Tanskanen, A., … Lyytinen, H. (2012). Infant brain responses associated with reading-related skills before school and at school age. Neurophysiologie Clinique/Clinical Neurophysiology, 42(1–2), 3541.CrossRefGoogle ScholarPubMed
Levitt, H. C. C. H. (1971). Transformed up‐down methods in psychoacoustics. The Journal of the Acoustical society of America, 49(2B), 467477.CrossRefGoogle Scholar
Li, S. (2016). The construct validity of language aptitude: A meta-analysis. Studies in Second Language Acquisition, 38, 801842.CrossRefGoogle Scholar
Lim, S. J., & Holt, L. L. (2011). Learning foreign sounds in an Alien world: Videogame training improves non-native speech categorization. Cognitive Science, 35, 13901405.CrossRefGoogle Scholar
Linck, J. A., Hughes, M. M., Campbell, S. G., Silbert, N. H., Tare, M., Jackson, S. R., … Doughty, C. J. (2013). Hi-LAB: A new measure of aptitude for high-level language proficiency. Language Learning, 63, 530566.CrossRefGoogle Scholar
Martin, K. I., & Ellis, N. C. (2012). The roles of phonological short-term memory and working memory in L2 grammar and vocabulary learning. Studies in Second Language Acquisition, 34, 379413.CrossRefGoogle Scholar
Mora, J. C., & Levkina, M. (2017). Task-based pronunciation teaching and research: Key issues and future directions. Studies in Second Language Acquisition, 39, 381399.CrossRefGoogle Scholar
Mueller, J. L., Friederici, A. D., & Männel, C. (2012). Auditory perception at the root of language learning. Proceedings of the National Academy of Sciences, 109(39), 1595315958.CrossRefGoogle Scholar
Muñoz, C. (2014). Contrasting effects of starting age and input on the oral performance of foreign language learners. Applied Linguistics, 35, 463482.CrossRefGoogle Scholar
Nicolay, A. C., & Poncelet, M. (2013). Cognitive advantage in children enrolled in a second-language immersion elementary school program for 3 years. Bilingualism: Language and Cognition, 16, 597.CrossRefGoogle Scholar
Norris, D., & McQueen, J. M. (2008). Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review, 115, 357.CrossRefGoogle ScholarPubMed
Omote, A., Jasmin, K., & Tierney, A. (2017). Successful non-native speech perception is linked to frequency following response phase consistency. Cortex, 93, 146154.CrossRefGoogle ScholarPubMed
Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading: An eye-tracking study. Studies in Second Language Acquisition, 38(1), 97130.CrossRefGoogle Scholar
Raz, N., Willerman, L., & Yama, M. (1987). On sense and senses: Intelligence and auditory information processing. Personality and Individual Differences, 8(2), 201210.CrossRefGoogle Scholar
Rosen, S. (2003). Auditory processing in dyslexia and specific language impairment: is there a deficit? What is its nature? Does it explain anything? Journal of Phonetics, 31, 509527.CrossRefGoogle Scholar
Russo, N. M., Skoe, E., Trommer, B., Nicol, T., Zecker, S., Bradlow, A., & Kraus, N. (2008). Deficient brainstem encoding of pitch in children with autism spectrum disorders. Clinical Neurophysiology, 119, 17201731.CrossRefGoogle ScholarPubMed
Saito, K. (forthcoming). Age effects in spoken second language vocabulary attainment beyond the critical period.Google Scholar
Saito, K. (2015). The role of age of acquisition in late second language oral proficiency attainment. Studies in Second Language Acquisition, 37(4), 713743.CrossRefGoogle Scholar
Saito, K. (2019). To what extent does long-term foreign language education help improve spoken second language lexical proficiency? TESOL Quarterly, 53(1), 82107.CrossRefGoogle Scholar
Saito, K. (2020). Multi-or single-word units? The role of collocation use in comprehensible and contextually appropriate second language speech. Language Learning, 70, 548588.CrossRefGoogle Scholar
Saito, K., Cui, H., Suzukida, Y., Dardon, D., Suzuki, Y., Jeong, H., Revesz, A., & Sugiura, M. (in press). Does domain-general auditory processing uniquely explain the outcomes of second language speech acquisition, even once cognitive and demographic variables are accounted for? Bilingualism: Language and Cognition.Google Scholar
Saito, K., Kachlicka, M., Sun, H., & Tierney, A. (2020). Domain-general auditory processing as an anchor of post-pubertal second language pronunciation learning: Behavioural and neurophysiological investigations of perceptual acuity, age, experience, development, and attainment. Journal of Memory and Language, 115(2), 104168.CrossRefGoogle Scholar
Saito, K., Sun, H., Kachlicka, M., Alayo, J. R. C., Nakata, T., & Tierney, A. T. T. (2020). Domain-general auditory processing explains multiple dimensions of L2 acquisition in adulthood. Studies in Second Language Acquisition, 130. Google Scholar
Saito, K., Sun, H., & Tierney, A. (2020a). Domain-general auditory processing as a perceptual-cognitive anchor of L2 pronunciation learning in adulthood: A longitudinal study. Applied Psycholinguistics, 41, 10831112. CrossRefGoogle Scholar
Saito, K., Sun, H., & Tierney, A. T. (2020b). Brief report: Test-retest reliability of explicit auditory processing measures. bioRxiv. Google Scholar
Saito, K., Suzukida, Y., Tran, M. & Tierney, A. T. (2021). Domain-general auditory processing partially explains second language speech learning in classroom settings: A review and generalization study. Language Learning71 (3), 147. CrossRefGoogle Scholar
Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research, 27(3), 343360.CrossRefGoogle Scholar
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. Palgrave Macmillan.CrossRefGoogle Scholar
Shao, Y., Saito, K., & Tierney, A. (2022). How does having a good ear promote instructed second language pronunciation development? Roles of domain-general auditory processing in choral repetition training. TESOL Quarterly. CrossRefGoogle Scholar
Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25, 193247.CrossRefGoogle ScholarPubMed
Shultz, A. A., Francis, A. L., & Llanos, F. (2012). Differential cue weighting in perception and production of consonant voicing. The Journal of the Acoustical Society of America, 132, EL95EL101.CrossRefGoogle ScholarPubMed
Skehan, P. (1998). Task-Based instruction. Annual review of applied linguistics, 18, 268286.CrossRefGoogle Scholar
Skoe, E., Krizman, J., Anderson, S., & Kraus, N. (2015). Stability and plasticity of auditory brainstem function across the lifespan. Cerebral Cortex, 25, 14151426.CrossRefGoogle ScholarPubMed
Snowling, M. J., Gooch, D., McArthur, G., & Hulme, C. (2018). Language skills, but not frequency discrimination, predict reading skills in children at risk of dyslexia. Psychological Science, 29(8), 12701282.CrossRefGoogle Scholar
Sun, H., Saito, K., & Tierney, A. (2021). A longitudinal investigation of explicit and implicit auditory processing in L2 segmental and suprasegmental acquisition. Studies in Second Language Acquisition, 43(3), 551573.CrossRefGoogle Scholar
Surprenant, A. M., & Watson, C. S. (2001). Individual differences in the processing of speech and nonspeech sounds by normal-hearing listeners. The Journal of the Acoustical Society of America, 110(4), 20852095.CrossRefGoogle ScholarPubMed
Talcott, J. B., Witton, C., McLean, M. F., Hansen, P. C., Rees, A., Green, G. G., & Stein, J. F. (2000). Dynamic sensory sensitivity and children’s word decoding skills. Proceedings of the National Academy of Sciences, 97, 29522957.CrossRefGoogle Scholar
Tallal, P. (2004). Improving language and literacy is a matter of time. Nature Reviews Neuroscience, 5(9), 721728.CrossRefGoogle ScholarPubMed
Tierney, A., Gomez, J. C., Fedele, O., & Kirkham, N. Z. (2021). Reading ability in children relates to rhythm perception across modalities. Journal of Experimental Child Psychology, 210, 105196.CrossRefGoogle ScholarPubMed
Toscano, J. C., & McMurray, B. (2010). Cue integration with categories: Weighting acoustic cues in speech using unsupervised learning and distributional statistics. Cognitive Science, 34, 434464.CrossRefGoogle ScholarPubMed
Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28, 130.CrossRefGoogle Scholar
Vafaee, P., & Suzuki, Y. (2020). The relative significance of syntactic knowledge and vocabulary knowledge in second language listening ability. Studies in Second Language Acquisition, 42, 383410.CrossRefGoogle Scholar
VanPatten, B. (2002). Processing instruction: an update. Language Learning, 52, 755803.CrossRefGoogle Scholar
Wallace, M. P. (2020). Individual differences in second language listening: examining the role of knowledge, metacognitive awareness, memory, and attention. Language Learning.Google Scholar
Webb, S., & Nation, P. (2017). How vocabulary is learned. Oxford University Press.Google Scholar
Won, J. H., Tremblay, K., Clinard, C. G., Wright, R. A., Sagi, E., & Svirsky, M. (2016). The neural encoding of formant frequencies contributing to vowel identification in normal-hearing listeners. The Journal of the Acoustical Society of America, 139, 111.CrossRefGoogle ScholarPubMed
Wong, P. C., & Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Applied Psycholinguistics, 28, 565.CrossRefGoogle Scholar
Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory processing. Psychology and Aging, 27, 410.CrossRefGoogle ScholarPubMed
Zhang, Y. X., Moore, D. R., Guiraud, J., Molloy, K., Yan, T. T., & Amitay, S. (2016). Auditory discrimination learning: Role of working memory. PLOS ONE, 11, e0147320.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Biographical backgrounds of 100 participants

Figure 1

Table 2. Descriptive summary of spoken L2 vocabulary proficiency relative to native benchmark

Figure 2

Table 3. Summary of a five-factor solution based on a factor analysis of spoken L2 lexicogrammar proficiency

Figure 3

Table 4. Summary of stepwise multiple regression models featuring only significant predictors of spoken L2 vocabulary proficiency

You have Access Open access
Cited by