We present the first study of the acquisition of the production (Study 1) and perception (Study 2) of voicing contrasts in stops and affricates in child speakers of the North Australian contact language Kriol. Kriol has been reported to exhibit an unusual degree of variability in the implementation of phonological contrasts and lexical items, both within and between speakers (Sandefur, Reference Sandefur1979, Reference Sandefur1986; Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020), and this variation presents Kriol-acquiring children with an unusual ‘moving target’ of acquisition, with significant implications for theories of language acquisition, and our understanding of the phonetics-phonology-lexicon interfaces.
As most language acquisition researchers are unlikely to be familiar with Kriol, and the language contact/Creolistics literature, we consequently first introduce Kriol, and the literature on Phonological variation and the Creole Continuum, before we discuss obstruent acquisition in sections Development of L1 phonology : Perception, and Development of L1 phonology : Production. Finally, we introduce the two studies (Study 1: stop production, and Study 2: mispronunciation detection) that we report on.
Kriol (ISO 639-3 rop) is an English-lexified Creole spoken in Northern Australia by approximately 20,000 people (Australian Institute of Aboriginal and Torres Strait Islander Studies/Commonwealth of Australia, 2005), making it the most widely spoken Indigenous language after English/Aboriginal English.Footnote 1 Kriol has developed in the past 100 years from contact between speakers of Australian Indigenous languages (often referred to as ‘substrate languages’) and English (often referred to as the ‘superstrate’ language) in areas associated with the pastoral industry in the Northern Territory, Queensland and Western Australia (Harris, Reference Harris1986; Munro, Reference Munro and Lefebvre2011; Sandefur, Reference Sandefur1979).
Phonologically, many Australian Indigenous languages, including those which contributed to Kriol, have ‘long and thin’ consonantal inventories: they have up to six oral places of articulation (labial, lamino-dental, apico-alveolar, apico-post-alveolar, lamino-alveo-palatal, and velar), sometimes with an additional glottal stop phoneme, but famously lack voicing distinctions and fricatives altogether. Some languages employ ‘fortis’ and ‘lenis’ stop consonant contrasts characterized by duration differences, including some of those which contributed to Kriol (Fletcher & Butcher, Reference Fletcher, Butcher, Koch and Nordlinger2014; Hamilton Reference Hamilton1996).
Instrumental phonetic and perception studies with adult speakers of Kriol (of the variety ‘Roper Kriol’, spoken in the Roper River basin in the Northern Territory; Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016, Reference Bundgaard-Nielsen and Baker2019) indicate that the consonantal inventory of Kriol reflects its dual heritage: The ‘long and thin’ inventory of Indigenous Australian languages has become even longer in Kriol, with the incorporation of a glottal fricative /h/, but also thicker, through inclusion of affricates /tʃ dʒ/ and voiceless fricatives /f s ʃ h/ (see Table 1).
Where a phonological stop voicing contrast (e.g., /p/ versus /b/) is implemented, stops differ in voice onset time (VOT) in initial position and in VOT and constriction duration in intervocalic contexts (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen and Baker2019; see also discussion of Light Warlpiri in Bundgaard-Nielsen & O’Shannessy, Reference Bundgaard-Nielsen and O’Shannessy2021, though Light Warlpiri is a Mixed Language, not a Creole). The medial constriction duration difference in (Roper) Kriol is like the ‘fortis’-‘lenis’ difference in some of the Kriol substrate languages: Ngalakgan, Ngandi, and Ritharrngu (Fletcher & Butcher, Reference Fletcher, Butcher, Koch and Nordlinger2014): fortis/long stops typically have twice the constriction duration of lenis/short stops. These phonetic characteristics set Kriol apart from English, including Standard Australian English (SAE), where constriction durations of voiced and voiceless pairs differ only by a few milliseconds (Byrd, Reference Byrd1993; Jones & Meakins, Reference Jones and Meakins2013), and this distinction is not perceptually relevant. Table 2, from Bundgaard-Nielsen and O’Shannessy (Reference Bundgaard-Nielsen and O’Shannessy2021) summarises the available VOT information for SAE and (Roper) Kriol.
* Approximate values available only.
Phonological variation and the Creole Continuum
Kriol, like other Creoles (Siegel, Reference Siegel2008, p. 235), has been conceptualised using a (Post-)Creole Continuum Model (DeCamp, Reference DeCamp and Hymes1971). Under this model, Creoles are conceived of as linguistic continua with one end described as ‘basilectal’ (more like the substrate languages) and the other described as ‘acrolectal’ (more like the superstrate language). The continuum model sees speakers as ‘sliding’ up and down the continuum in response to such factors as interlocutors, context, and topic, as in this quote from Sandefur (Reference Sandefur1986, p. 50), describing Kriol (where ‘heavy’ means ‘closer to basilectal’ and ‘light’ means ‘closer to acrolectal’):
The majority of Kriol words [have] several alternate pronunciations (e.g. jineg, jinek, sinek, sineik, sneik ‘snake’) […] Except for the extreme heavy and light variations of some words, most Kriol speakers control virtually all pronunciations in their active everyday speech. No Kriol speaker speaks with a consistently light pronunciation. […] With few exceptions, every stream of Kriol speech will contain some words with heavy pronunciations and some with light pronunciations. Within the same conversation and even within the same sentence, it is not uncommon for Kriol speakers to use more than one of the pronunciation alternatives.
In the original proposals, the Creole continuum is discussed almost exclusively with respect to the lexicon and morpho - syntax (e.g., DeCamp, Reference DeCamp and Hymes1971; Rickford, Reference Rickford1987; Bickerton Reference Bickerton1973), both domains under (some degree of) speaker control, and domains that are well-known to be exploited for sociolinguistic purposes. In Kriol, by contrast, the primary manifestation of the continuum model is in claims of highly variable phonological specifications for words, and – the focus here – in terms of the phonetic implementation of its obstruent phonology, the latter a domain of speech that is much less likely to be under conscious control, and thus perhaps less likely to be employed as a sociolinguistic marker.
At the basilectal end of the continuum, Kriol is reported to lack voicing and manner contrasts in obstruents (there is only one series of stops and no fricatives or affricates), reflecting the phonological structure of the substrate languages. To exemplify, this has the consequence that both /pi:tʃ/ and /bi:tʃ/ refer to the fruit (‘peach’) and the location (‘beach’) at the basilectal end of the continuum. At the acrolectal end of the continuum, however, Kriol maintains two series of contrasting stops, and contrastive fricatives and affricates, reflecting the phonological structure of the lexifier English. Consequently /pi:tʃ/ refers to the fruit only, and /bi:tʃ/ to the location only at the acrolectal end of the continuum. Under this model, speakers thus sometimes implement stop voicing and stop-fricative contrasts, and sometimes not, even within a single utterance and even for the same lexical items (Sandefur, Reference Sandefur1979, Reference Sandefur1985, Reference Sandefur1986; Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020). Recent work has questioned this unusual ‘on-off’ phonological system and suggested that most, if not all, of this phonological variation can be accounted for, not by the continuum model, but by differences in (first and second) language acquisition histories and use patterns (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016, Reference Bundgaard-Nielsen and Baker2019; cf. Siegel, Reference Siegel2008: Chapter 9 on Tok Pisin in Papua New Guinea). Such an explanation is not incompatible with observations of variation in the use of Kriol in Kriol-speaking communities.
While all languages exhibit some degree of phonetic variation in the realisation of their phonological inventories, the phonological variation proposed for Kriol constitutes an unusual learning challenge for Kriol-acquiring children, because the interchangeable use of a basilectal phonological inventory and a larger acrolectal inventory requires infants and children to manage a high degree of unpredictable phonological (and lexical) variation in the input and acquire a similar system of ‘fluid’ L1 phonology and flexible lexical specifications. In other words, if we take the Creole Continuum at face value in the domain of phonology, Kriol-acquiring children must keep their phonological options open by acquiring a highly underspecified phonological system, until and unless regularity in the implementation of voicing and manner contrasts (and stable lexical representations) develops. This stabilisation has been suggested to happen through regular contact and rapprochement with the lexifier (i.e., de - creolisation), through formal schooling (Sandefur, Reference Sandefur1979; Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020), though it is not clear how the process is assumed to work. We speculate that the argument would be that phonological contrasts at the acrolectal end of the continuum stabilise because contact with the lexifier reinforces them, as opposed to being a situation in which transfer from English introduces lexifier contrasts, as Kriol stop voicing phonetics differs from English by relying on constriction duration as well as VOT (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen and Baker2019.).
The reported variation in Kriol phonology presents a significant challenge to theories of L1 phonological acquisition and the typical predictable intergenerational transmission of language. Most, if not all, theories of first (and second) language segmental acquisition such as the Perceptual Assimilation Model (PAM: Best, Reference Best and Strange1995), the Speech Learning Model, (SLM: Flege, Reference Flege1991), and the Native Language Magnet model (NLM: Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008) assume that languages have concrete lexical specifications of words and stable phonological inventories.Footnote 2 These models of acquisition differ in how they account for first language segmental attunement and the formation of a phonological system, but they all assume that language acquisition relies on children identifying systematic, though not always simple, relationships between the phonemes of the L1, as they participate in lexemes, and their phonetic realisations as they are presented in the input from caregivers and the community. Thus, first language acquisition under the Creole Continuum Model – in the domain of phonology – constitutes a very unusual scenario.
There has been limited investigation of the acquisition of L1 phonology by Creole-acquiring children (but see discussion of two studies below in Development of L1 phonology : Production), and to our knowledge none with children acquiring Kriol, and no proposal for L1 acquisition under a Continuum model is available. Such research would have implications for language acquisition theory, with a view to understanding the effect of variation in the input, broaden the typological scope of the existing empirical work, and equally importantly, have implications for language description and language policy, and education, in language contact scenarios. Decades of predominantly laboratory-based research with a range of (predominantly European) languages has however demonstrated that infants and toddlers acquire their native language phonology in terms of both perception and production within the first few years of life, and that a stable phonological system is essential for word-learning (see discussion immediately below). In what follows, we review the acquisition of phonology with a particular view to the role of variability in the input, and the role of vocabulary expansion for L1 phonological development.
Development of L1 phonology: Perception
Perception research with infants acquiring a wide range of (non-contact) languages shows that infants’ phonological category formation is well underway by 6-10 months of age, with categorical perception of native consonants (including voicing distinctions) acquired first, and native vowels towards the end of the first year of life (see for instance, Best & McRoberts, Reference Best and McRoberts2003; Best et al., Reference Best, McRoberts and Sithole1988; Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008; Polka et al., Reference Polka, Colantonio and Sundara2001; Rivera-Gaxiola et al., Reference Rivera-Gaxiola, Silva-Pereyra and Kuhl2005; Werker & Lalonde, Reference Werker and Lalonde1988; Werker & Tees, Reference Werker and Tees1984). It is also, presumably, regularity and predictability in the shape of lexemes, and in the phonetic realisation of the L1 phonemes, that allows typically developing children, in their second year of life, to begin to recognise familiar words also in novel and unfamiliar dialects/accents (e.g., Best et al., Reference Best, Tyler, Gooding, Orlando and Quann2009; Mulak et al., Reference Mulak, Best, Tyler, Kitamura and Irwin2013; White & Aslin, Reference White and Aslin2011).
In light of this research, how one might conceptualise phonological acquisition in a language without a stable phonological inventory and stable lexemes is a significant question, and it may be intuitively appealing to consider the reported variation in Kriol as similar to the type of challenge faced by bilingual children exposed to partially overlapping phonologies and cognate lexical items in their languages. In bilingual environments, however, research shows that infants can tell at least some aspects (rhythmic, phonological/phonetic) of those languages apart soon after birth (Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés2001; Byers-Heinlein et al., Reference Byers-Heinlein, Burns and Werker2010), suggesting that children recognise the acoustic-phonetic or articulatory ‘signature’ of each language as distinct, and that they do not interpret the difference between the two languages as a dimension of variation within one language system. Presumably this is because the phonological and lexical differences (even in cognate words) are structured and predictable in the input to bilingual infants; the structure and predictability make this type of input fundamentally different to the type of unpredictable variation between contrast maintenance and a lack of contrast which is supposed to hold for Kriol.
Learning two phonological systems at the same time, may, however, take a little longer than it takes to learn just one (Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés2003; Burns et al., Reference Burns, Yoshida, Hill and Werker2007; Graf Estes & Hay, Reference Graf Estes and Hay2015). This is hardly surprising given the increased complexity in the task and the fact that dual language input reduces the quantity of input in each language; input frequency matters even for monolingual children (Anderson et al., Reference Anderson, Morgan and White2003). Importantly, for the present paper, this suggests that children respond to greater variation in the input – here in the form of input from two different languages – by extending the ‘data collection period’ before they settle on an analysis/inventory for each language. Consequently, we might expect that the reported variation in Kriol may induce a similarly extended data collection, and a longer period with substantial variation in the realisation of individual phonemes between and within children, though the onset of word-learning may provide a natural end to the extent of this period.
The vowel perception of Spanish–Catalan bilingual infants, in particular, highlights the role that word-learning plays in encouraging an infant to work out and ‘settle’ on a phonological inventory, again assuming that words have canonical forms, and are composed of phonological segments selected from a stable phonological system, and with language specific phonetic realisations. Infants exposed to Spanish and Catalan from birth successfully discriminate and categorise Catalan /e/-/ɛ/, a contrast not shared with Spanish, at 12 months of age (Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés2003). However, Catalan–Spanish bilingual children lose this ability in the second year of life, when word-learning accelerates, and fail to notice mispronunciations of these vowels in familiar words. Recovery in the ability to discriminate the Catalan /e/-/ɛ/ contrast is seen in the third year of life, but only for children who are Catalan dominant and presumably have more consistent evidence that /e/-/ɛ/ are contrastive (Ramon-Casas et al., Reference Ramon-Casas, Swingley, Sebastián-Gallés and Bosch2009). Spanish-dominant children face a longer journey to success, and it is possible that this reflects early lexical encoding consistent with Spanish rather than Catalan, and limited or unreliable evidence of contrastiveness (Ramon-Casas et al., Reference Ramon-Casas, Cortéz, Benet, Lleó and Bosch2023). Research with adult Spanish–Catalan bilinguals further shows that Spanish-dominant adults are also less accurate and more variable in /e/-/ɛ/ discrimination (Bosch & Ramon-Casas, Reference Bosch and Ramon-Casas2011; Lleó et al., Reference Lleó, Cortés, Benet, Siemund and Kintana2008). This suggests that the effects of early lexical encoding persist for years even in continued language contact environments.
We are not arguing here that Kriol-speaking children are raised as Kriol–English bilinguals and experience a conflict in the perception of the phonology of two related but distinct languages like Spanish–Catalan bilinguals, nor do we focus on vowel acquisition. The insights from the research with Spanish–Catalan bilinguals, however, are relevant because they highlight the importance of the quality and quantity of input to children for successful phonological development, and the role of vocabulary acquisition in segmental acquisition. The (atypical, perhaps) Spanish–Catalan bilingual research is particularly relevant here (as opposed to research investigating simultaneous acquisition of typologically distant languages) because it suggests that – in case of confusion or conflict in partially overlapping input – children adopt the smaller system rather than the bigger one. For the bilingual Spanish–Catalan infants, this results in a Spanish vowel inventory unless the evidence is particularly good that Catalan has an /e/-/ɛ/ contrast (Ramon-Casas et al., Reference Ramon-Casas, Swingley, Sebastián-Gallés and Bosch2009, Reference Ramon-Casas, Cortéz, Benet, Lleó and Bosch2023). If Kriol, and thus Kriol input to children, is highly variable and well-captured by a continuum model, we might hypothesise that Kriol-acquiring infants/young children will ‘settle’ on the most minimal phonological system possible (namely, one without a phonological voicing or manner distinction in obstruents), as they may ignore evidence of these distinctions, if the input is too varied or too inconsistent. It is such a system that we might then expect to see implemented also in production.
Development of L1 phonology: Production
While the relationship between speech perception and production is not perfectly understood, it is generally assumed that children’s production of phonological segments (and thus words) develops based on their perception of the input. Becoming a speaker of a language, however, involves not just the recognition of what should be produced, but also how to produce it, and target fidelity in child language production lags relative to adult-like segmental perception. Child productions are characterised by differences in child versus adult word forms, and in phonological processes such as segment deletion, insertion, and metathesis.
In languages including English, Mandarin, Standard Greek, Cypriot Greek and Korean, research has shown that children produce contrastive VOT from an early age (Kim & Stoel-Gammon, Reference Kim and Stoel-Gammon2009; Okalidou et al., Reference Okalidou, Petinou, Theodorou and Karasimou2010; Yang, Reference Yang2021), though they exhibit more variation in the realisation of voiceless stops than adults and may try to avoid using words with voiceless stop onsets because they are harder to produce (Lowenstein & Nittrouer, Reference Lowenstein and Nittrouer2008; Millasseau et al., Reference Millasseau, Bruggeman, Yuen and Demuth2021). Differences in the age of attainment of two- vs three-way contrasts (e.g., English, Mandarin, and Standard Greek vs Korean and Cypriot Greek) further suggest that it is harder to acquire a three-way distinction. Despite this greater variability in child productions, cross-linguistic comparisons of age of target-like production of phonemes also suggest that most (though not all) children largely produce their native phonemes correctly by their fifth birthday, with articulatorily more complex segments being acquired later than articulatorily simple segments (McLeod & Crowe, Reference McLeod and Crowe2018). While the timing of this achievement in production is later than what can be demonstrated for segmental perception, the patterns indicate clearly that children very quickly become not just effective listeners but also effective producers of their L1, and many of the phonological and phonetic processes that impede production can be ascribed to differences in fine-motor control between adults and children and differences in the coordination of speech gestures (Millasseau et al., Reference Millasseau, Bruggeman, Yuen and Demuth2021). They are not typically ascribed to difficulties in perceiving the contrast in question.
Two recent studies of the acquisition of phonology, including stop voicing distinctions, by Creole-speaking children – the French-lexified Haitian Creole (Archer et al., Reference Archer, Champion, Tyrone and Walters2018) and English-lexified Jamaican Creole (León et al., Reference León, Washington, McKenna, Crowe and Fritz2022) – are particularly relevant to what we might expect from child Kriol-speakers in terms of production. These studies have both taken the (post-)Creole continuum and its associated variation as a theoretical given, yet the results from the studies indicate consistency in the production (and, we assume, also perception) of the children. Archer et al. (Reference Archer, Champion, Tyrone and Walters2018) indicate that young Haitian Creole-speaking children produce all but three (/ʧ ŋ ɲ/) Haitian Creole phonemes by the age of four, including voicing distinctions, in a manner that does not indicate the degrees of variation or idiosyncrasies predicted under a continuum model. León et al. (Reference León, Washington, McKenna, Crowe and Fritz2022) investigated differences between child and adult productions of Jamaican Creole phonemes in children aged 3;4–5;11 years and found that both child and adult participants produced voicing-based distinctions, but that the adults in general produced voiced stops with negative VOT, while children produced voiced stops with an average of just over 0 ms positive VOT. Developmental and other population differences in this study considered, these results indicate that child speakers are acquiring the language of their community, including a stable phonological system and stable lexemes. The results of both studies are thus consistent with voicing contrast acquisition by children acquiring non-Creole languages discussed above, differing in the number of stop voicing categories and in the phonetic specifications of the stops. The results invite closer investigation of the productions of Kriol-acquiring children, to determine whether and how they implement a voicing distinction, whether they have flexible phonological specifications for Kriol words, and importantly, whether contact with the lexifier at school induces differences in linguistic behaviour (and presumably the phonological inventory of Kriol).
The present studies
We present two studies of Child Kriol VOT production and perception. Study 1: Child Kriol stop production examines Kriol stop production by 13 L1 Kriol-speaking children aged 4-7 years, to examine whether there is evidence of systematic implementation of a voicing distinction in stops and affricates, and stable lexical targets, despite the reported (extreme) variability in the input. The results from Study 1 indicate that child speakers of Kriol implement a VOT and constriction-based distinction in stops and affricates in stable lexical targets, consistent with the pattern observed for adult Kriol speakers in Baker et al. (Reference Baker, Bundgaard-Nielsen and Graetzer2014), but inconsistent with reports of variable implementation of stop voicing and stop-fricative contrasts within and between speakers of Kriol. The results offer no evidence that schooling induces a voicing distinction in stops and affricates, though we observe some age-related differences in the implementation of VOT in alveolar and velar stop contrasts.
Study 2: Mispronunciation detection examines the same Kriol-speaking children’s ability to detect a range of mispronunciation types in familiar words, such as the shift from /duwa/ (Eng. ‘door’) to /tuwa/. This study complements Study 1, with evidence that Kriol-speaking children recognise that Kriol words have canonical forms and accept words produced with the canonical shape while rejecting words that deviate in voicing, manner or place of articulation, as well as combinations of the three. The results also indicate that children’s ability to reject mispronunciations improves with age for all tested mispronunciation types, not just those based on voicing mispronunciations that have been proposed to be variable and proposed to stabilise under influence from English. This improvement likely reflects the fact that older children have greater cognitive capacity, improved phonemic awareness and better test-taking skills, rather than an effect of extended exposure to English.
Together, the studies suggest that Kriol-speaking children acquire a single, stable phonological inventory, rather than a continuum of phonologies, and lexical items with canonical phonological specifications. This is consistent with what is assumed for non-Creole languages and provides evidence that input characterised by significant phonetic variation, and large numbers of L2 users, does not disrupt L1 phonological development.
Study 1: Child Kriol stop production
We recruited 13 children (7 female, 6 male; age range = 4;8 to 7;0, M age = 72.5 months/6 years) for the present study. At the time of testing, all participants lived in Beswick/Wugularr (see Figure 1): a remote, predominantly (95.7%) Indigenous community in the Northern Territory, Australia, with a population of 515 individuals (Australian Bureau of Statistics, 2022). All children were L1 Kriol speakers, raised by Kriol-speaking caregivers. In addition to Kriol, most parents and caregivers also had receptive and some expressive language competence in one or more ‘traditional’ (i.e. Indigenous, pre-contact) Australian languages, and to some extent in Australian English and/or Aboriginal English. All children were Indigenous Australian, raised in extended family households, including grandparent(s), parents, aunts/uncles, siblings, and cousins: the average household in Beswick has six permanent occupants (Australian Bureau of Statistics, 2022), but extended visits from relatives are common. Child rearing in Indigenous communities in Australia is often communal, with grandmothers and aunts (biological or classificatory), in particular, playing significant roles, and sometimes taking the role of main caregiver for extended periods (for a discussion, see, for instance, Lohoar et al., Reference Lohoar, Butera and Kennedy2014). According to the Australian Bureau of Statistics, <5% of the Beswick population speak L1 Standard Australian English (SAE) (Australian Bureau of Statistics, 2022), and children are typically first exposed to SAE once they commence formal schooling, although most media such as television are in SAE also. Formal education levels in the community are very low by Western standards; 10% of the adult (15+) population in Beswick has completed Year 12 (High School) as their highest level of education (Australian Bureau of Statistics, 2022).
All participants attended the Preschool (one participant) or Primary school program (grades 0-3, 12 participants) at Wugularr Primary School in Beswick. All were acquiring English as a second language (L2) at school: classroom teaching at Wugularr School is conducted in English, generally by non-Indigenous staff. Indigenous (Kriol-speaking) support staff are, however, often present in the classrooms and assist with interpreting teacher instructions. We do not have school attendance information for the children and therefore cannot assess the amount of L2 English input each child has had via school. Instead, we use child age as a proxy for the quantity of L2 input received, and we assume that the older a child is, the more English exposure they have had. We acknowledge that this is an imprecise measure: research in similar schools in the Northern Territory suggests that attendance may only be around 50-60% on average (based on Department of Education figures quoted in Hill, Reference Hill2008), so each child’s quantity of exposure to English as an L2 must be considered in this light.
All children were reported to have normal hearing, but we did not conduct a formal hearing screening to verify this. Recurrent otitis media (middle ear infection) is common in many Indigenous communities, and some of the children may have an undiagnosed hearing impairment, despite none being reported. Participants were recruited by word of mouth, through existing contacts in the community. Parental consent and child assent were attained. Each child was rewarded with a small toy (either a book, toy car, doll, or textas/colouring pencils).
It was the research policy that children who were invited but did not wish to participate were also offered a toy; no child declined participation, and several requested repeat testing. To avoid the possibility of parental coercion, we did not compensate parents/caregivers for their time. The research was approved by the University of Melbourne Human Research Ethics Committee, approval number #1035119.
To elicit the voiced and voiceless stops /p t k b d ɡ/ in word-initial and word-medial positions, in a wide range of vowel contexts, as well as affricates /tʃ dʒ/ in word-initial position, we selected 36 depictable nouns, in consultation with two literate adult L1 Kriol speakers, one who is trained in early childhood education, and another who has been involved in the creation of Kriol literacy materials for children. The consultants deemed that the selected words (see Table 3) were highly familiar words to young children. Figure 2 provides four examples of the visual material presented. Many words provided more than one target consonant. To ensure that the children would not identify word-initial/-medial stop production as the measure of interest in this study, nine filler items were included (in IPA): /san/ ‘sun’, /maus/ ‘mouse’, /eɡ/ ‘egg’, /kreb/ ‘crab’, /wotʃ/ ‘watch’, /masil/ ‘freshwater clam’, /li:f/ ‘leaf’, /fiʃ/ ‘fish’, /sisis/ ‘scissors’. Additionally, practice items /dres/ ‘dress’, /haus/ ‘house’ and /baik/ ‘bike’ were presented before the elicitation task proper to ensure that the children understood the procedure. Data from filler and practice items were not included in the analyses. We avoided consonant clusters wherever possible; the word /sebɻa/ ‘zebra’ was the only exception due to difficulty identifying depictable nouns with word-medial /b/ in Kriol.
We elicited the Kriol lexical items in the following way: each participant was seated at a table in a quiet room in a house in Beswick, in front of a monitor displaying a PowerPoint presentation containing, first, the three practice items, and then a pseudo-randomised list of the elicitation items, each item displayed one at a time, with the pre-recorded Kriol prompt Wanem dijan? (‘What is this?’), spoken in a child-directed register by a female L1 Kriol speaker, in her 50s (from a different community, and we assume unfamiliar to the children). The task was explained to the children in Kriol and English, by the first and second authors.
When a child provided a correct response, the experimenters gave positive feedback. Incorrect responses (for instance, saying ‘bottle’ for ‘water’) received feedback about the desired item name in Kriol: all productions uttered after a prompt was given were excluded from analysis, to avoid accommodation. Each picture was displayed for as long as the child wished, and if it became clear that a child would not provide a response to a particular image, the item name was provided, the trial was skipped, and the researcher moved to the next elicitation item. All responses were recorded using a PMD660 Marantz flash-RAM digital recorder with a DPA d:fine headset microphone. All recordings had a 16-bit sampling depth with a sampling rate of 44.1 kHz.
We extracted a total of 1200 analysable consonants from the 13 children’s productions (one word/item/child, except where a child missed a trial). This allowed for the extraction of 1200 VOT measurements: 818 from word-initial stops and affricates, and 382 from word-medial stops (no medial affricates were elicited). We also extracted 382 word-medial constriction duration measurements; it is impossible to reliably identify word-initial constriction duration for words in isolation.
Tokens which were incompletely produced or interrupted by background noise, laughter, or contact with the microphone were excluded from our analysis. Only items where the child produced the intended utterance (or a semantically related Kriol word which included the original target consonant in the intended position: e.g. /kap/ ‘cup’ for ‘coffee’, /piɡpiɡ/ or /piɡipiɡi/ for ‘pig’ /beːɖbeːɖ/ or /beːɖ/ for ‘bird’, /siːteːdul/ ‘sea turtle’ for ‘turtle’) were included in this analysis. Excluding non-target responses resulted in a loss of 3.1% of the VOT data and 1.98% of the constriction duration data.
The acoustic recordings were analyzed in Praat (Boersma & Weenink, Reference Boersma and Weenink2013). Words and relevant consonants were segmented manually by a phonetically trained individual, and VOT (for initial and medial consonants) and constriction duration measurements (for medial consonants) were extracted using a Praat script. VOT was defined as the time between the burst/stop release and the onset of voicing. Voicing measurements were taken at the zero crossing before the second clear periodic wave. Constriction duration was measured as beginning at the initial stop closure (the end of the preceding vowel’s clear F2) and ending at the stop burst.
Study 1: Predictions
On the basis of the review of Kriol phonology and phonetics, and the literature on phonological acquisition, we suggest that a number of plausible outcomes of Study 1 are possible. Predictions based on the Creole Continuum Model would likely involve a great degree of variation in stop realisation both within and between child speakers, and within and between lexical items, as speakers could presumably be in a range of positions on the continuum. We might also see effects of age (as a proxy for exposure to English) such that older children produce VOT contrasts more reliably (or with English-like constriction durations) than younger children, due to de-Creolisation via formal schooling.
Predictions based on the literature on (predominantly non-Creole) phonological acquisition would suggest that, with appropriate caveats such as differences in the ratio of for instance, L1 to L2 Kriol input, and age-based differences in motorskills, Kriol-acquiring children will be able to manage (phonetic) variability in the input and settle on a single phonological inventory (rather than two or even more), and canonical word forms. This single phonological inventory would be implemented with reasonable consistency across the speaker group in both perception and production, reflecting a shared knowledge of Kriol phonology (even if the child speakers’ stop realisations exhibit greater degrees of variation than what is observed in adult Kriol).
Study 1: Results
Group results are presented in Figure 3. We analysed the word-initial VOT values of the produced plosive consonants using linear mixed-effects modelling (LMM), where participants and lexical items were assigned as random intercepts, and the children’s age (standardised: Mean = 0, SD = 1) was included as a covariate. Random slopes were not included due to a relatively small number of observations per condition (and thus models would fail to converge, as our initial analysis suggested). A total of three models were fitted, see Table 4. Model 1 took stop voicing as the only fixed factor (apart from age), while Model 2 included both stop voicing (voiced and voiceless, two levels) and POA (labial, alveolar, post-alveolar and velar, four levels) as categorical predictors. The categorical factors were treatment-coded, e.g., for stop voicing, ‘voiced’ = 0 (reference level), and ‘voiceless’ = 1. Model 3 also considered the interaction between the two factors (Voicing category and POA). The models were compared for their performance based on Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) metrics, and Model 3 achieved the lowest AIC and BIC values, indicating the best goodness-of-fit. According to Model 3, voiced stops had a close-to-zero VOT estimate (5.518 ms), while voiceless stops had longer VOTs (β = 46.636, p < .0001). The difference between voiced and voiceless post-alveolars is 92ms (p = .0008) larger than the difference between voiced and voiceless labials (the reference level), in a cross-linguistically typical pattern. Finally, we calculated the effect size for the VOT differences between voiced and voiceless consonants for each POA: Large-sized differences (Cohen, Reference Cohen1988) were found between /b/ and /p/ (labials), Cohen’s d = 1.35, 95% CI = [0.78, 1.92], between /d/ and /t/ (alveolars), d = 1.59, 95% CI = [0.92, 2.26], between /dʒ/ and /tʃ/ (post-alveolars), d = 4.00, 95% CI = [2.60, 5.42], and between /ɡ/ and /k/ (velars), d = 1.63, 95% CI = [0.80, 2.45]. Taken together, the results indicate that Kriol-speaking children produce a clear VOT-based distinction for voiced and voiceless stops at the word-initial position, and that the distinction is present in all tested POAs.
Note: * = p < .05, ** = p < .01, *** = p < .001.
For VOTs in word-medial plosives, we also fitted three models with different levels of complexity, see Table 5. Since the dataset contained a very small number of observations of post-alveolar affricates at the medial position (N = 5), only labials, alveolars, and velars were considered in these models (N = 377). By comparing the AIC and BIC metrics, Model 3 showed the best goodness-of-fit, although it originally had a boundary (singular) fit due to the relatively small dataset, and thus the random factor ‘lexical item’ was removed for the model to converge. According to Model 3, voiced labial stops had a close to zero estimate of VOT of -9.695 ms, while voiceless stops tended to have long VOTs (β = 80.953, p < .0001). Additionally, voiced alveolar and velar stops tended to have longer VOTs than labial voiced stops (β = 19.762 and 38.054, p = .0408 and p < .0001, respectively). Significant interactions were found between voicelessness and the alveolar stops (β = -63.634, p < .0001), as well as between voicelessness and the velar stops (β = -61.069, p < .0001), indicating that the VOT difference was more salient in the labial context. As with the word-initial stops, we calculated the effect size of VOT differences between voiced and voiceless stops: a large-sized difference was found between /b/ and /p/ (labials), Cohen’s d = 1.89, 95% CI = [1.54, 2.24], while the differences were only medium-sized between /d/ and /t/, d = 0.38, 95% CI = [-0.17, 0.94], and between /k/ and /ɡ/, d = 0.44, 95% CI = [0.11, 0.77]. These results suggest that Kriol children had a VOT-based distinction between voiced and voiceless stops word-medially, but the magnitude of difference differs across different POAs, such that the difference was more robust in the labial stops than the alveolar and velar stops. Additionally, the covariate (standardised age) also showed a positive coefficient (β = 9.370, p = .0246), indicating that older kids produced longer stops, indicating more individual differences. This is consistent with previous research suggesting that voiceless stops and affricates are acquired (in production) later than their voiced counterparts.
Note: * = p < .05, ** = p < .01, *** = p < .001.
We also analysed the constriction durations in word-medial stops, and the data were again used to fit three mixed-effects models, see Table 6. By comparing the AIC and BIC metrics, the full model (Model 3) again showed the best fit overall, and we chose Model 3 as the final model for interpreting the patterns. At the reference level, voiced labial stops had an estimated constriction duration of 89.093 ms (p < .0001). Noticeably, voiced alveolar stops tended to have significantly shorter constriction durations (β = -62.929, p = .0086). Voiceless labial stops tend to have longer constriction durations but the coefficient failed to reach the significance value (β = -43.185, p = .0647). To further explore this effect, we carried out a Kenward-Roger F-test on the whole model, which revealed a main effect of voicing, F(1, 10.932) = 9.776, p = .0097, as well as a significant main effect of POA, F(1, 10.624) = 9.876, p = .0038, while the POA × voicing interaction effect was not significant, F(1, 10.641) = 0.057, p = .9445. This additional test indicated that there was indeed a difference in constriction duration between voiced and voiceless stops across the POAs. Therefore, and as for the VOT data, we calculated the effect sizes for the constriction duration differences and observed large-sized d metrics across all three POAs: for labials, d = 1.62, 95% CI = [-0.15, 3.39]; for alveolars, d = 1.24, 95% CI = [-0.68, 3.16]; and for velars, d = 1.36, 95% CI = [-0.17, 2.88]. To conclude, Kriol children produced voiceless stops with significantly longer constriction durations as compared to the voiced stops, and the constriction duration-based distinction was similarly robust across the three POAs, unlike the VOT differences where alveolars and velars only had medium-sized differences.
Note: * = p < .05, ** = p < .01, *** = p < .001.
Together, the word-initial and word-medial results indicate that young Kriol-speaking children use VOT and constriction duration information to differentiate Kriol stops /p b/, /t d/, and /k ɡ/ as well as (in initial position at least) affricates /tʃ/ and /dʒ/, in a way similar to that of adult Kriol speakers (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016, Reference Bundgaard-Nielsen and Baker2019). There is no indication in this dataset that child Kriol speakers are acquiring a language in which maintenance of VOT and constriction duration contrasts is optional or unusually variable causing the values for each member of a contrast to overlap substantially. The implementation of a constriction duration contrast word-medially further indicates that these stop contrasts have not been acquired through contact with the lexifier/L2 English in a formal educational setting: the Kriol children produce medial voiceless stops differing from voiceless stops in English, as VOT is the primary cue in English (see Table 2) while Kriol relies on constriction duration, perhaps even more so than VOT. This is consistent with the results reviewed for children acquiring non-Creole languages as well as French-lexified Haitian Creole (Archer et al., Reference Archer, Champion, Tyrone and Walters2018) and English-lexified Jamaican Creole (León et al., Reference León, Washington, McKenna, Crowe and Fritz2022), which all demonstrate that children acquire the phonological inventory and the language specific phonetic realisations of these phonemes (including stop voicing) of their linguistic community/caregivers. The results also indicate that Kriol-acquiring children agree on the lexical specification of familiar Kriol words: there is no indication that the children individually determine a preferred lexical specification for each Kriol word presented.
In order to address the question of variable implementation of a VOT and/or constriction duration contrast in child Kriol, and the question of whether access to English in a formal school setting induces an English-like VOT-based contrast in child Kriol, we further investigated the data from the individual participant children. The relatively low number of tokens obtained from each child, and the fact that some children did not produce all target consonants in both word-initial and -medial position makes individual statistical analysis unfeasible, and we adopt a purely descriptive approach out of necessity. The individual data are presented in Figure 4, for word-initial VOT, word-medial VOT, and word-medial constriction duration, respectively (see Appendices 1a and 1b for the number of tokens available for analysis for each child).
Word-initially, almost all children produced voiceless stop consonants with systematically longer VOTs, indicated by the number above each child’s mean voiced and voiceless values. The only exceptions to a highly consistent pattern appear to be KC04 (initial labial), KC01 and KC03 (initial alveolar), and KC12 (initial velar), where this difference falls below 20 ms, and arguably indicates that no contrast can reliably be perceived.
Word-medially, the pattern of VOT contrast maintenance is less straightforward than in initial position (see Figure 5). All children, but one, maintain a contrast at the bilabial POA, but only two of seven children with sufficient individual data to allow for assessment maintain a contrast at the alveolar POA. At the velar POA, only six of 13 children maintain a contrast. The velar results are however consistent with observations from adult Kriol, that VOT is a secondary (optional) cue to contrast maintenance at the velar place of articulation in word-medial position, while duration is the primary cue. Since the regression analysis showed previously that age was a significant predictor in medial VOT production, we additionally present the individual means against the children’s age in a scatter plot (see Figure 6). As can be seen from the graph, Kriol-speaking children showed an age-related pattern, such that the VOT (but not the constriction duration) distinction is initially less clear in the medial stops for alveolars and velars.
The individual measurements for word-medial constriction duration (see Figure 7) indicate that all children, irrespective of their age, implement a clear duration contrast for bilabial and velar stops. The pattern is less clear for the alveolar POA, where only three of seven children with sufficient data to compare voiced and voiceless stop durations produce a difference in constriction duration greater than 20 ms. Three of these children (KC01, KC02 and KC07) also fail to differentiate /t d/ in terms of VOT. Again, it is possible, and plausible, that a tendency to tap intervocalic /t/ explains at least some of this variation between the children, and with this caveat, the results are consistent with an overall interpretation of the data as consistent with contrast maintenance.
Study 1: Discussion
Study 1 examined the VOT and constriction duration characteristics of Kriol stops and affricates produced by 13 children from Beswick/Wugularr. The group and individual results indicate that the Kriol-speaking children produce two series of stops and affricates that systematically differ in VOT and (medially) also in constriction duration. Specifically, ‘voiceless stops’ are characterised by long VOTs and by long constriction durations in at least word-medial position where it can be measured, while voiced stops are characterised by short positive VOTs or pre-voicing in the case of medial /b/, and much shorter constriction duration. This is consistent with what has been reported for adult Kriol (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014). The results also indicate that older children produce longer medial voiceless stop VOTs at the alveolar and velar POAs. In the case of the alveolar contrast, we speculate that the availability of a tapped allophone may influence the results (Tollfree, Reference Tollfree, Blair and Collins2001). In the case of the velar contrast, we note that previous research with adult Kriol speakers, as well as speakers of the mixed language Light Warlpiri, has demonstrated that VOT is not consistently implemented as a cue to contrast, while the constriction duration difference is highly consistently implemented. The individual results, though descriptive, are largely consistent with the group results, showing that Kriol-speaking children implement very similar stop contrasts. We suspect that cases where contrast maintenance is questionable in the present dataset are due to random variation in very small datasets, and perhaps occasional confusion about the lexical specifications for a particular word.
Neither group nor individual results are consistent with suggestions that the use of VOT and constriction duration to differentiate voiced and voiceless stops is highly variable within and between speakers in Kriol. Bar the /t d/ realisations produced by Child 1 and Child 4, all children maintain each contrast in at least one of the measures obtained, indicating consistency within and between speakers, particularly in the light of the fact that child speech is more variable than adult speech. Such variability does not obscure language-specific phoneme boundaries.
The results from Study 1 are thus inconsistent with suggestions that Kriol does not have VOT/duration-based contrasts with clear target realisations, that contrast maintenance is optional, and that VOT-based contrasts are L2 English contrasts transferred into Kriol. All children in the present study implement VOT/constriction duration contrasts consistent with those in adult Kriol, and the phonetic characteristics of their stop and affricate contrasts are un-English-like – but very Kriol-like – in the use of constriction duration information in conjunction with VOT information (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016, Reference Bundgaard-Nielsen and Baker2019). We thus observe no evidence to support the claim that the production of VOT-based stop contrasts (and stop-fricative contrasts) in Kriol is induced by formal English-language schooling (Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020): all children produce VOT- and constriction duration-based contrasts, irrespective of their age, which in the present study is taken as proxy for the quantity of English that the children have been exposed to through formal schooling in English.
In what follows, we complement the production task with a mispronunciation detection task, to determine whether Kriol-speaking children perceive changes to Kriol lexemes in terms of voicing, manner, or place features and/or more substantial changes (multiple feature deviations or vowel changes, which we denote as ‘Substitution’ here), and whether there is evidence that contact with English through formal schooling results in changes to their perceptual behaviour.
Study 2: Mispronunciation Detection
In order to examine stop and affricate perception in Kriol-speaking children, we followed Study 1 with a Mispronunciation Detection task (Study 2). The mispronunciation detection task employed here is a variant of a 2-Alternate Forced Choice (2AFC) paradigm in which participants are presented with either correctly produced or incorrectly produced spoken-word stimuli (typically a concrete noun paired with an image of the object). The participant is then asked to indicate whether the spoken stimulus was correctly or incorrectly produced, and the number of ‘accepted’ or ‘correct’ answers is statistically compared across the different types of mispronunciations examined (voicing, manner, place, substitution, and no mispronunciation). This paradigm relies on the fact that mispronunciation rejection can only occur if a participant perceives a given mispronounced stimulus word as a deviation from a known lexical target. In the case of the present study, all mispronunciations consisted of a deviation in a single phoneme, and perceiving the mispronunciation thus relies on the presence of a phonological contrast between the target phone in a lexical item and the substituted or mispronounced phone in the mispronounced stimulus word.
The participants in Study 2 were the same children from Beswick who participated in Study 1 reported above. Testing took place in the same quiet location as Study 1, and Study 2 was conducted immediately after the completion of Study 1.
Materials and procedure
The target words for the mispronunciation detection study were a selection of 24 easily depictable Kriol nouns, such as (the equivalents of English) door, book, turtle, bottle, used in the production study reported above, plus an additional set of words not used for elicitation (see Appendix 2 for the full list as well as the mispronunciation manipulations). The target words were elicited from a literate female native speaker of Kriol in her 40s (from a different community, and we assume not familiar to the children), using orthographic prompts. The speaker produced five repetitions of each target word in a child-directed speech style. Stimulus selection was based on the auditory impressions of the first and second authors and focused on identifying targets with similar speaking volume, speaking rate, and intonation.
Each of the target images was presented with the correctly pronounced target word, as well as 2-5 mispronounced forms, resulting in a total of 99 test items of which 24 (25%) were produced according to the correct phonemic specification in Kriol. Each stimulus word belonged to one of five categories:
1) Unmodified tokens (correctly produced; 25 trials);
2) Voicing-modified tokens (a single-feature change such as /deibul/ for /teibul/ Eng. table; 15 trials);
3) Manner of articulation (MOA)-modified tokens (a single-feature change such as /seibul/ for /teibul/ Eng. table; 15 trials);
4) Place of articulation (POA)-modified tokens (a single-feature change such /ɡakit/ for /bakit/ Eng. bucket; 20 trials); and
5) Substitution: a combination of feature changes (place/manner/voice) such as /beibul/ for /teibul/ Eng. table, or a vowel substitution such as /bebul/ for /babul/ Eng. bubble (24 trials).
14 of the 15 voicing mispronunciations targeted the initial consonant of the target word, and only one item contained a word-medial mispronunciation (/taiɡa/ to /taika/) due to the requirement that each item was an easily depictable noun, generally familiar to children in Beswick. None of these mispronounced forms resulted in a shift to another Kriol lexical item. The targets were presented in a fixed, but randomised order generated by an online service, and the presentation order was checked to ensure that no two instances of the same target (mispronounced or correctly pronounced) were presented in succession at any point (See Appendix 2). This departs from standard laboratory best practice but provides necessary field-testing flexibility in remote communities, where control of the testing environment is limited, and test interruptions of many kinds frequent.
We tested the children’s sensitivity to phoneme mispronunciations in the following way: each participant was seated at a table in a quiet room in Beswick, in front of a PowerPoint display containing the pseudo-randomised presentations of the 99 picture and voice prompts. Each picture was displayed with a pre-recorded Kriol label for the object, spoken in a child-directed speech style by the Kriol speaker described in the previous section. Following the presentation of the picture and the oral label, the children were required to reply to the question imin tok raitwei o rongwei? (‘Did s/he say it correctly or incorrectly?’) in Kriol. If the children wished to listen to an item again, it was replayed, and a response was recorded for each test item.
The task was explained to the children in Kriol, as well as in English. The task was administered by a researcher seated next to the child, and each child’s responses were scored by a researcher discretely placed behind the child and confirmed against recordings of each testing session (for verbal replies). The children typically responded verbally (raitwei ‘correct’/rongwei ‘incorrect’) though some responded by nodding their head for ‘correct’ and shaking their heads for ‘incorrect’. 11 children (of the total number of 13 children from Beswick who participated in Study 1) completed the task and were included in the analyses presented below. Two children were excluded from the analyses of Study 2 because they did not complete the task (Child 4 [64mo] and Child 8 [75mo]).
Study 2: Predictions
The results from Study 1, and the reviewed material in the Introduction, again allow us to make a number of general predictions for the ability of Kriol-speaking children to perceive a range of phonological changes to familiar words. Firstly, under a Creole Continuum Model, it would be expected, or at least overwhelmingly likely, that Kriol-speaking children demonstrate a high degree of tolerance for mispronunciations of words that result from the basilectal neutralisations of either stop voicing distinctions, or stop-fricative distinctions, given that such neutralisations remain consistent with a basilectal phonological inventory. Mispronunciations that pertain to place of articulation, or multi-feature/full phoneme substitutions, however, should still be unacceptable to children under this model, given that they are not licenced by any of the phonologies along the Kriol phonological continuum.
Predictions based on the (exclusively) non-Creole L1 segmental acquisition literature differ from those made under a Continuum Model. Here, children would be predicted to have (relatively) stable phonological representations of common and familiar words (even if they may struggle with pronunciation themselves), and successfully accept words with canonical phonological specifications and reject words with phonological mispronunciations of any kind. The children would be expected to reject single-feature mispronunciations of all types equally well, irrespective of whether they involve voicing, manner or place, but might find it easier (White & Morgan, Reference White and Morgan2008), to reject mispronunciations which deviate in two or three features than those which differ from the canonical form in a single feature. The expected symmetrical rejection of all types of mispronunciations contrasts with the expectations we can generate under the Continuum Model (i.e., better ability to reject place of articulation-based mispronunciations than voicing- and manner-based ones).
Study 2: Results
Since the children’s responses were binary (‘accept’ was coded as 1, and ‘reject’ was coded as 0), their acceptance rates were analysed using generalised linear mixed-effects modelling (GLMM, binomial link), and the descriptive patterns are summarised in Figure 8. As with the production data analysis in Study 1, we built a series of three models with different levels of complexity (see Table 7): Model 1 took phonological modification type as the only fixed factor; Model 2 also took modification type as a fixed factor, while the Kriol children’s age (in months, standardised) was also included as a numeric covariant; finally, Model 3 further considered the interactions between Kriol children’s age and different modification types. The models were evaluated for their performance, and Model 3 showed the lowest AIC and BIC metrics, indicating the best goodness-of-fit. Model 3 took the unmodified stimuli as the reference level, and it indicated that all four modification types (i.e., Voicing change, Manner of articulation (MOA) change, Place of articulation (POA) change, and Substitution) had a different baseline from the reference level (p’s < .0001). Additionally, the Kriol children’s age was also indicated to interact with the four modification types (p’s < .0011). This pattern can also be seen from the descriptive results (Figure 8), such that: (1) the unmodified stimuli received much higher acceptance rates as compared to the four modification types, and (2) there was a tendency for older children to have lower acceptance rates for the modified stimuli, while the acceptance rate for unmodified stimuli tended not to correlate with the children’s age.
Note: * = p < .05, ** = p < .01, *** = p < .001.
In order to assess the pairwise differences between the five modification types (Unmodified, Voicing change, MOA change, POA change, and Substitution) whilst controlling for the variable of child age, we carried out a series of post hoc tests based on Model 3 (see Table 8), which revealed that the acceptance rates of the unmodified stimuli were significantly higher than all four modification types (p < .0001 for four comparisons, Bonferroni-adjusted), and the acceptance rates of Voicing change were higher than Substitution (p = .0198, Bonferroni-adjusted). No other comparisons were significant.
Study 2: Discussion
Study 2 examined the ability of the Kriol-speaking child participants to correctly accept Kriol words in their canonical form and correctly reject those same Kriol words when they have been mispronounced by varying a single feature (voicing, manner, or place of articulation), or by changing the lexical specification of the word by more than one feature of a consonant or vowel phoneme (substitution). The group results show that Kriol-speaking children accept Kriol words in their canonical form, and get better at rejecting all types of mispronunciations with age. Importantly for the question of voicing contrasts in child Kriol, the results show that the children reject voicing-based mispronunciations just as they reject other single-feature changes (in manner or place of articulation) and mispronunciations involving two or more features (voicing, POA and MOA). This indicates that VOT, just like manner and place, is a linguistic variable that the children systematically manipulate for lexical contrast maintenance in Kriol, despite the reported variation in the input particularly with respect to VOT.
Overall, the results thus indicate that Kriol-speaking children develop stable, canonical entries in their mental L1 Kriol lexicon in addition to well-established phonological categories and knowledge of the degree of phonetic variation that is permissible in the realization of each of these phonemes. This is consistent with the speech production study reported in Study 1, which shows that Kriol-speaking children’s productions of voiced and voiceless stops differ systematically in terms of VOT and constriction duration.
The results from Study 2 are thus at odds with previous reports of high degrees of variation in lexical specifications of (adult) Kriol words (Sandefur, Reference Sandefur1979). They are also inconsistent with prior claims that the use of VOT and constriction duration to maintain stop voicing contrasts in Kriol is irregular for adults and older children (Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020). Rather, the results are consistent with recent studies indicating that Kriol speakers can discriminate voiced and voiceless Kriol-like stops (Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen and Baker2019) and have lexemes with canonical forms, recognized as such by L1 speakers (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016).
Finally, because there is no evidence in our data that voicing-based mispronunciation detection differs from other types of (single-feature) mispronunciation detections, we find no support for claims of an ongoing decreolisation process (Sandefur, Reference Sandefur1979; Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020). Rather than decreolisation underpinning the improved performance for the older children relative to the younger children in terms of VOT perception (i.e., systematic shifting towards the acrolectal end of the continuum due to contact with the lexifier at school), the parallel patterns observed for VOT mispronunciations and other single-feature mispronunciations might be better accounted for as a single process of increased cognitive capacity, better phonemic awareness, and improvements in their test-taking skills.
Alternatively, or in addition, it may rely on young children’s vocabularies having fewer entries than older children, leading to fewer phonological neighbours for each entry and accommodating phonologically underspecified representations of words (e.g., Metsala, Reference Metsala1999), until the vocabulary fills in. This is consistent with other reports which show that attending to fine-grained acoustic differences between native phones can be very difficult for young children in word-learning or mispronunciation tasks, even when those contrasts are readily discriminated outside of word-based tasks (see for instance, Stager & Werker, Reference Stager and Werker1997; Swingley & Aslin, Reference Swingley and Aslin2007).
We reported here on two studies which examined the production (Study 1) and perception (Study 2) of stop and affricate voicing contrasts in child Kriol as it is spoken in the community of Beswick/Wugularr in the Northern Territory of Australia. Contrary to earlier reports of high degrees of inter- and intra-speaker variation in contrast maintenance and the phonological specifications for Kriol words (e.g., Jones & Meakins, Reference Jones and Meakins2013; Sandefur, Reference Sandefur1986; Stewart et al., Reference Stewart, Meakins, Algy and Joshua2018, Reference Stewart, Meakins, Algy, Ennever and Joshua2020), the results from Study 1 (stop and affricate production) indicate that children acquiring Kriol implement stop and affricate contrasts using both VOT and constriction duration, in a manner consistent with what has been reported for adult speakers of Kriol elsewhere, and reflective of the phonetic characteristics of stops in some of the substrate languages, from the onset of formal schooling (Baker et al., Reference Baker, Bundgaard-Nielsen and Graetzer2014; Bundgaard-Nielsen & Baker, Reference Bundgaard-Nielsen, Baker, Meakins and O’Shannessy2016, Reference Bundgaard-Nielsen and Baker2019).
The results from Study 2 (mispronunciation detection) likewise indicate that Kriol-speaking children accept correctly produced familiar words in Kriol from the onset of formal schooling, and reject mispronounced words irrespective of whether these are single-feature deviations in voicing, manner, or place characteristics of stop consonants, with increasing confidence as they get older. The Kriol-speaking children also reject words with more substantial deviations (multiple features) or vowel substitutions. The fact that children improve in their ability to reject VOT-based mispronunciations in the same manner as they improve in the ability to reject manner and POA-based mispronunciations offers no support for the claim that VOT contrasts are acquired or enhanced differentially to other single-feature mispronunciations through a process of decreolisation. The results from Study 1 and 2 also clearly demonstrate that, despite past claims, there is no evidence of random variation in terms of the phonological specifications for words (the children agree on the phonological shape of familiar words), and consequently in terms of the phonological system in the data presented here.
The results presented here are thus generally consistent with what has been reported for children speaking Haitian Creole (Archer et al., Reference Archer, Champion, Tyrone and Walters2018) and Jamaican Creole (León et al., Reference León, Washington, McKenna, Crowe and Fritz2022) as well as children acquiring non-Creole languages, and it is comforting in two ways. Firstly, such a scenario (of extreme variability in the phonological specifications of lexemes) would present challenges to children acquiring any language with similar levels of variation, beyond what children in multilingual and multidialectal societies experience, and perhaps be akin to what one might imagine in a perpetual situation of creolisation – or perhaps more aptly by a perpetual situation of re - creolisation by each successive generation of children. Secondly, accounting for L1 language acquisition under a scenario of random variation would likely require substantial revisions to theories of segmental acquisition and processing (Best, Reference Best, Goodman and Nusbaum1994, Reference Best and Strange1995; Flege, Reference Flege1991; Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008). We are at pains to stress that the challenge to theories of acquisition lies not in variation per se (including dialectal or L2-induced variation of which these children presumably receive quite a substantial quantity), but in unpredictable variation à la Sandefur (Reference Sandefur1986). The children in Beswick, however, do not appear to be tasked with such a problem: despite being a linguistically diverse community with many L2 users, the Kriol-speaking Beswick community is not one in which children are faced with insurmountable variation in their language input, nor is it one in which re- or de-creolisation are synchronic processesFootnote 3. The behaviour of the children in the tasks reported on here suggest that they approach the acquisition of their L1 in a similar way to children acquiring non-Creole languages, and that this process happens in a synchronically relatively stable and predictable linguistic landscape, irrespective of diachronic linguistic upheaval and language formation, just a few generations back.
Questions of intergenerational transmission aside, the fact that a stable language variety of Kriol is being transmitted to children, however, should not be taken to mean that Kriol communities are linguistically homogenous, and that is not our position. Many, if not all, Kriol-speaking communities have complex language ecologies – characterised by high degrees of multilingualism in the population, as well as a substantial number of L2 users of many of the community languages, including Kriol. Such variation is however not unstructured nor is it unpredictable, and as stated above, systematic (L2-induced) variation does not interfere with intergenerational transmission of Kriol as a stable L1. As demonstrated in e.g., Best et al., (Reference Best, Tyler, Gooding, Orlando and Quann2009), Mulak et al., (Reference Mulak, Best, Tyler, Kitamura and Irwin2013), and White & Aslin (Reference White and Aslin2011) child language acquisition is flexible and efficient enough to be able to deal with systematic variation in the input from specific speakers, including L2 users, and speakers from other dialects/varieties of the same language. We wonder, however, whether children acquiring Kriol would show evidence of taking a little longer to establish their L1 phonology, like the Greek Cypriot children discussed in the introduction (Okalidou et al., Reference Okalidou, Petinou, Theodorou and Karasimou2010), for instance, due to the potentially high degree of variation in the input; studies of the acquisition (perception and production) with younger Kriol-acquiring infants and children than included here would help resolve this question.
The studies reported here have obvious implications for the description of present-day Kriol, as well as for theories of Creole formation, the concept of the Creole Continuum, and our understanding of the processes of creolisation and de-creolisation. The results contribute to growing evidence that Kriol in Australia has become a stable language (in the sense that any language can be described as stable), which is neither in the process of forming (through continuous re-creolisation based on high degrees of unstructured variation), nor ‘evolving’ or ‘sliding along a continuum towards the acrolect’ due to continued contact with the historical lexifier English (cf. Meakins et al., Reference Meakins, Hua, Algy and Bromham2019). The results also suggest that de-creolisation is not an inevitable process and should not be assumed to be a synchronic factor in every language contact situation, even when contact continues between the Creole and the lexifier (cf. Siegel, Reference Siegel2008; Winford, Reference Winford1997; who make similar points).
The studies reported here also have implications for education, given that Kriol-speaking children receive formal instruction in English in the Australian education system. The demonstrated differences in the phonological inventories of Australian English and Kriol, and the differences in the phonetic implementation of shared phonemic contrasts, as well as very substantial differences in other linguistic domains and lexicon, show clearly that L1 Kriol-speaking children are not speakers of English, and that they do not effortlessly ‘slide’ into a version of Kriol that is ‘close enough’ to pass for English as a consequence of formal schooling in English. This means that Kriol-speaking children face similar difficulties as children with other non-English backgrounds who enter the education system in Australia without any substantial competence in English as an additional language, and the research presented here highlights Kriol-speaking children’s need for language and educational support. This is particularly, perhaps, the case in the domain of literacy, given the differences between English and Kriol phonologies, and the differences in the phonetic realisations of phonemes shared by the two languages, even in shared lexical items.
In conclusion, the present studies of child Kriol obstruent production and perception indicate that Kriol-speaking children have canonical representations of familiar words, and a stable phonological inventory similar to that reported for adult speakers of Kriol. The results do not indicate unusually high degrees of unsystematic variation in child Kriol nor in Kriol in general, but are rather indicative of predictable intergenerational transmission, and a phonological acquisition trajectory similar to that which infants and children acquiring non-Creole languages exhibit. This is consistent with a framework in which the observable linguistic variation in Kriol is seen to reflect a large number of L2 speakers who have varying, though internally consistent degrees of L2-accentedness, not of a Continuum Model of variation. The results are not consistent with a de-creolisation process, in which the variation reflects a fluid segmental inventory and flexible lexical specifications of Kriol words, and a progressive alignment with the lexifier English.
We thank Mandy Manggurra, Lenny Joshua†, Hilda Ngalmi, the children who participated in this study, and their families who supported and encouraged our research. We also thank Dr. Stephen and Ms. Joanne Hill for their generous assistance and hospitality, and Associate Professor Barbara Kelly† for her encouragement, advice, and enthusiasm. We gratefully thank the Australian Research Council for funding the research reported here through the Discovery Project program (Grant DP130102624) to the first two authors, and we thank the National Science Foundation and the Australian Academy of Science for work supported through an EAPSI fellowship (award #1515018) to the third author.