Introduction
Language acquisition is one of the most critical developmental milestones in early childhood. It is central to learning, socializing and forming relationships. In the United States, Kindergarten language scores have repeatedly been shown to be the single best predictor of school achievement in third and fifth grade (Durham, Farkas, Hammer, Tomblin & Catts, Reference Durham, Farkas, Hammer, Tomblin and Catts2007; Pace, Burchinal, Alper, Hirsh-Pasek & Golinkoff, Reference Pace, Burchinal, Alper, Hirsh-Pasek and Golinkoff2019), and children who have better language skills are more successful in regulating their emotions (Cole, Armstrong & Pemberton, Reference Cole, Armstrong, Pemberton, Calkins and Bell2010). One factor known to influence language development is a child’s language learning environment. The seminal work by Hart and Risley (Reference Hart and Risley1995) demonstrated that the total amount of speech heard by an infant is highly correlated with their language outcomes. Children whose parents talk less tend to have smaller vocabularies by the time they are three years old. This difference, known as the “30-million-word gap”, predicts children’s IQ scores and academic success in grade school (Golinkoff, Hoff, Rowe, Tamis-LeMonda & Hirsh-Pasek, Reference Golinkoff, Hoff, Rowe, Tamis-LeMonda and Hirsh-Pasek2019; Hart & Risley, Reference Hart and Risley1995).
Until recently, studies examining children’s language input relied on time-consuming transcription of parental and child language, limiting the amount of data that could be collected and analyzed. Technological advances now allow for longer, more ecologically valid recordings of children’s naturalistic language environments. In recent years, one of the main approaches for measuring children’s language input is the Language Environment Analysis (LENA; Greenwood, Thiemann-Bourque, Walker, Buzhardt & Gilkerson, Reference Greenwood, Thiemann-Bourque, Walker, Buzhardt and Gilkerson2011). An important advantage of LENA is that it facilitates audio recordings in children’s natural environments on a day-long timescale, and is supplemented by automated speech analyses. Studies using LENA have confirmed significant variation in the amount of language children experience in association with parental language input, though the size of the “word gap” has recently been proposed to be substantially smaller than 30-million words (i.e., around 4 million words; see Gilkerson, Richards, Warren, Montgomery, Greenwood, Oller, Hansen & Paul, Reference Gilkerson, Richards, Warren, Montgomery, Greenwood, Oller, Hansen and Paul2017). In addition, studies following these key discoveries have noted that parental language behaviors are influenced by a variety of social and cultural factors, such as policies, beliefs, values and political systems, among others (see Rowe & Weisleder, Reference Rowe and Weisleder2020 for a recent review). Together, this large and growing body of research has come to a more fine-tuned conclusion: Language develops in context. Children learn the language(s) that are used around them, and their early social interactions with language shape their language learning trajectories. Studies conducted in industrialized countries, such as the United States, demonstrate that the number of words that infants hear alone is insufficient to account for the observed variation in children’s language development; the quality of language input also needs to be considered (see Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust & Suma, Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola & Nelson, Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008; Rowe, Reference Rowe2012; Tamis-LeMonda, Kuchirko & Song, Reference Tamis-LeMonda, Kuchirko and Song2014; Weisleder & Fernald, Reference Weisleder and Fernald2013).
Unlike input quantity, which is straightforward to measure (i.e., typically, researchers use adult word counts), the quality of caregiver language input can be measured in various ways, and specific aspects of input may be more or less important depending on the child’s age and/or level of language development. Rowe and Snow (Reference Rowe and Snow2020) have recently conceptualized the features of high-quality caregiver input that facilitate language development in terms of three dimensions: linguistic, interactive, and conceptual. Linguistic features include levels of linguistic complexity, repetition, and redundancy that are adapted to the child’s age or developmental stage. Interactive features include periods of joint attention, interactive play, parental responsiveness, and reciprocity. Conceptual features include topics of conversation that provide appropriate challenges for the child’s age or developmental stage. According to this model, learning is optimal when each of the three dimensions is maximized, and may be hindered if any dimension is minimized. Within this framework, an important goal of developmental language research is to identify aspects of language input that work across these three dimensions, as they may be particularly potent predictors of later language abilities.
Social-interactionist and sociocultural theories have long-emphasized the importance of children’s early social experiences for language development, showing that infants (children under age 2 years) benefit enormously from the social and interactional features of language input (e.g., Bruner, Reference Bruner1981; Kuhl, Reference Kuhl2007; Snow, Reference Snow, Snow and Ferguson1977a; Vygotsky, Reference Vygotsky, Gauvain and Cole1979). One key feature of social language interactions that could serve as an “ideal” language learning signal in the first 24 months is parentese, the acoustically exaggerated, clear, and higher-pitched speech produced by adults when they address infants. Initially termed “baby talk” (Ferguson, Reference Ferguson1964), parentese is distinguished from adult-directed speech (ADS) by a variety of segmental and prosodic features, including higher overall pitch and wider pitch range, slowed speech rate, exaggerated intonation contours, fewer and simpler lexical items, shorter utterances, and longer pauses between phrases (Fernald, Reference Fernald1985; Fernald & Simon, Reference Fernald and Simon1984; Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies & Fukui, Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Garnica, Reference Garnica, Snow and Ferguson1977; Grieser & Kuhl, Reference Grieser and Kuhl1988; Stern, Spieker, Barnett & MacKain, Reference Stern, Spieker, Barnett and MacKain1983). Parentese is used across cultures in spoken and signed languages by parents, grandparents, siblings, teachers, and adults who do not have their own children (Ferguson, Reference Ferguson1964; Jacobson, Boersma, Fields, Olson & David, Reference Jacobson, Boersma, Fields, Olson and David1983; Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg & Lacerda, Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Reilly & Bellugi, Reference Reilly and Bellugi1996). For example, one study showed that mothers in the United States, Sweden, and Russia produce acoustically more extreme vowels when addressing their infants than they did when addressing adults, resulting in an expanded vowel space, providing exceptionally well-specified information about the building blocks of words (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). Other studies have noted the existence of prosodic features of parentese in languages such as German, Mandarin, Tamil, Tagalog, and Korean (Fernald & Simon, Reference Fernald and Simon1984; Grieser & Kuhl, Reference Grieser and Kuhl1988; Narayan & McDermott, Reference Narayan and McDermott2016). From early on, it was noted that a similar speaking style is also used in other circumstances, such as when addressing a family dog (Hirsh-Pasek & Treiman, Reference Hirsh-Pasek and Treiman1982; Mitchell, Reference Mitchell2001) or in conversations with foreigners (Snow, van Eeden & Muysken, Reference Snow, van Eeden and Muysken1981), leading some to wonder whether parentese is a misnomer. However, there is now clear evidence that parentese has some unique properties that distinguish it from other speech registers. For instance, parentese, but not dog-directed speech, is characterized by vowel hyperarticulation (Burnham, Kitamura & Vollmer-Conna, Reference Burnham, Kitamura and Vollmer-Conna2002; Gergely, Faragó, Galambos & Topál, Reference Gergely2017; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997); along similar lines, foreigner-directed speech lacks the high pitch and positive affect characteristics of parentese (Singh, Morgan & Best, Reference Singh, Morgan and Best2002; Uther, Knoll & Burnham, Reference Uther, Knoll and Burnham2007).
It has also long been known that infants prefer parentese over standard ADS from as early as two days after birth (Cooper & Aslin, Reference Cooper and Aslin1990; see also Fernald, Reference Fernald1985; Fernald & Kuhl, Reference Fernald and Kuhl1987). Infants’ preference for parentese over ADS has recently been confirmed in two large-scale studies across cultures, procedures, languages, and laboratories, in monolingual and bilingual infants (Byers-Heinlein, Tsui, Bergmann, Black, Brown, Carbajal, Durrant, Fennell, Fiévet, Frank, Gampe, Gervain, Gonzalez-Gomez, Hamlin, Havron, Hernik, Kerr, Killam, Klassen, Kosie, Kovács, Lew-Williams, Liu, Mani, Marino, Mastroberardino, Mateu, Noble, Orena, Polka, Potter, Schreiner, Singh, Soderstrom, Sundara, Waddell, Werker & Wermelinger, Reference Byers-Heinlein, Tsui, Bergmann, Black, Brown, Carbajal, Durrant, Fennell, Fiévet, Frank, Gampe, Gervain, Gonzalez-Gomez, Hamlin, Havron, Hernik, Kerr, Killam, Klassen, Kosie, Kovács, Lew-Williams, Liu, Mani, Marino, Mastroberardino, Mateu, Noble, Orena, Polka, Potter, Schreiner, Singh, Soderstrom, Sundara, Waddell, Werker and Wermelinger2021; The ManyBabies Consortium, 2020). While early scholars warned that caregiver use of parentese may be damaging to children’s language development (McCarthy, Reference McCarthy and Carmichael1954), further investigation demonstrated that parentese was fully grammatical, used phonology that avoided complex clusters of consonants (Newport, Gleitman & Gleitman, Reference Newport, Gleitman, Gleitman, Snow and Ferguson1977; Phillips, Reference Phillips1973; Snow, Reference Snow1977b) and vowels that were temporally and spectrally expanded (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). Later laboratory research demonstrated that parentese facilitated infants’ word segmentation (Thiessen, Hill & Saffran, Reference Thiessen, Hill and Saffran2005), word recognition (Singh, Nestor, Parikh & Yull, Reference Singh, Nestor, Parikh and Yull2009), and fast mapping (Ma, Golinkoff, Houston & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011).
Studies using daylong recordings have allowed researchers to study parentese as it occurs naturally in infants’ day-to-day lives. Studies conducted in the United States with monolingual English-speaking families and bilingual Spanish–English speaking families have shown that most parents use parentese; however, its frequency varies widely, even in families where infants experience high rates of adult talk (Ramírez-Esparza, García-Sierra & Kuhl, Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2016, 2017; see also Ferjan Ramírez, Hippe, Correa, Andert & Baralt, Reference Ferjan Ramírez, Hippe, Correa, Andert and Baralt2022; Shapiro, Hippe & Ferjan Ramírez, Reference Shapiro, Hippe and Ferjan Ramírez2021). That is, some infants receive most of their language input through parentese, while other infants experience parentese relatively infrequently. Importantly, these studies also demonstrated that higher rates of parentese use in the homes of monolingual and bilingual infants are associated with higher rates of child babbling at one year of age and greater productive vocabularies at 24 and 33 months of age (Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2016, Reference Ramírez-Esparza, García-Sierra and Kuhl2017; see also Ferjan Ramírez et al., Reference Ferjan Ramírez, Hippe, Correa, Andert and Baralt2022; Shapiro et al., Reference Shapiro, Hippe and Ferjan Ramírez2021). A recent intervention study (Ferjan Ramírez, Lytle, Fish & Kuhl, Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018; Ferjan Ramírez, Lytle & Kuhl, Reference Ferjan Ramírez, Lytle and Kuhl2020) suggests that the links between caregiver use of parentese and child language learning may be causal. In the Ferjan Ramírez et al. (Reference Ferjan Ramírez, Lytle and Kuhl2020) study, parent coaching increased the rates of caregiver parentese use from 6 to 18 months, and this increase was associated with enhanced growth in infant babbling from 6 to 14 months (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018), and greater word production at 18 months (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020). Of note, turn-taking as measured automatically by the LENA technology was also enhanced in this intervention.
Taken together, a large and growing body of research suggests that parentese may represent an ideal high-quality signal for language learning in infancy. This is not surprising, considering that parentese has been demonstrated to maximize all three key quality dimensions proposed in the model by Rowe and Snow (Reference Rowe and Snow2020). Linguistically, key features of parentese are its distinct segmental and prosodic features, which are adjusted to the child’s level in real-time and in accordance with the child’s responses. Interactive features of parentese include the behaviors that frequently co-occur with its use, such as eye-gaze, joint attention, interactive play, reciprocity, contiguity, and connectedness. Conceptually, caregivers tend to use parentese to talk about what is happening in the “here and now”, with frequent pointing and reference to objects or events present in the child’s immediate environment.
Recent studies also suggest an association between parental use of parentese and the frequency of caregiver-infant back-and-forth exchanges (conversational turns), another key mechanism supporting infants’ language uptake (Levinson, Reference Levinson2016). Unlike overheard speech or speech from an electronic source (Kuhl, Tsao & Liu, Reference Kuhl, Tsao and Liu2003; Shneidman, Arroyo, Levine & Goldin-Meadow, Reference Shneidman, Arroyo, Levine and Goldin-Meadow2013; Weisleder & Fernald, Reference Weisleder and Fernald2013), turn-taking allows caregivers to provide contingent feedback adjusted to their infant’s linguistic needs. For example, research has shown that mothers adjust their speech in accordance with their infants’ responses (Braarud & Stormark, Reference Braarud and Stormark2008; Smith & Trainor, Reference Smith and Trainor2008). Infants, in turn, adjust their vocalizations, thereby creating a feedback loop that supports language growth (Warlaumont, Richards, Gilkerson & Oller, Reference Warlaumont, Richards, Gilkerson and Oller2014). Turn-taking provides opportunities for temporal contiguity and contingency and joint engagement between parents and children, which are critical in word learning and predict children’s subsequent language skills (Bornstein, Tamis-LeMonda & Haynes, Reference Bornstein, Tamis-LeMonda and Haynes1999; Conboy, Brooks, Meltzoff & Kuhl, Reference Conboy, Brooks, Meltzoff and Kuhl2015; Hirsh-Pasek et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Tamis-LeMonda et al., Reference Tamis-LeMonda, Kuchirko and Song2014). Finally, recent brain studies propose that, through contiguity, contingency, connectedness, and social feedback, turn-taking shapes the social circuitry of the language-related brain areas (Merz, Maskus, Melvin, He & Noble, Reference Merz, Maskus, Melvin, He and Noble2020; Romeo, Leonard, Robinson, West, Mackey, Rowe & Gabrieli, Reference Romeo, Leonard, Robinson, West, Mackey, Rowe and Gabrieli2018a; Romeo, Segaran, Leonard, Robinson, West, Mackey, Yendiki, Rowe & Gabrieli, Reference Romeo, Segaran, Leonard, Robinson, West, Mackey, Yendiki, Rowe and Gabrieli2018b).
While there is evidence supporting the short-term benefits of parentese and turn-taking to infant and toddler language learning, the extent to which their consistent use in infancy is associated with longer-term language outcomes remains less clear. While we know that most parents in the United States use parentese when their infants are between 6 and 24 months of age, we also know that there is quite a bit of variability from family to family, and potentially, also within families from one developmental time point to the next (see Shapiro et al., Reference Shapiro, Hippe and Ferjan Ramírez2021). Within a cascade model of development (Landry, Smith & Swank, Reference Landry, Smith and Swank2003; Tamis-LeMonda, Luo, McFadden, Bandel & Vallotton, Reference Tamis-LeMonda, Luo, McFadden, Bandel and Vallotton2019), a potential hypothesis is that language outcomes observed in later childhood (i.e., at Kindergarten entry) are a reflection of the cumulative impact of language input across early childhood. For example, one study examined the relation between consistency of maternal responsiveness across early childhood (birth to preschool years) and children’s language outcomes at age 8 years in a sample of preterm and term children. Results demonstrated that children who experienced consistently high levels of responsivity across the first four years of life scored higher on language measures at the age of 8 years compared to those children whose maternal responsivity scores were less consistent (Landry et al., Reference Landry, Smith and Swank2003). With these findings in mind, one hypothesis is that consistent use of high rates of parentese and/or turn-taking that are stable over time could contribute to longer-term positive outcomes. In another study, Gilkerson and colleagues (Reference Gilkerson, Richards, Warren, Oller, Russo and Vohr2018) show that parent-infant conversational turns during a narrow time-window of 18-24 months predicted children’s language scores 10 years later. However, a recent study demonstrates a significant association between parental use of parentese in the first year of life (6-14 months) and parent-infant turn-taking at the age of 18 months (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018), suggesting that the stepping stone to later language may be either consistent use of parentese in infancy, consistent turn-taking in infancy, or a combination of the two. Within a cascade model of development, one can hypothesize that when infants are raised in consistently rich, high quality language learning environments, these early experiences instigate a drive on the infant’s part to respond and join in on the conversation, activating developmental cascades. This would suggest that parental consistent use of high parentese rates and/or turn-taking in infancy may benefit children’s language development to the extent that if a child develops sufficiently robust early language skills, this can continue to “drive” language-learning interactions, furthering the child’s lexical and grammatical growth. In children aged 4-6 years, the quantity of parent-child conversational turns has been linked not only to children’s cognitive performance, but also to the function and structure of their language-related brain networks (Romeo et al., Reference Romeo, Leonard, Robinson, West, Mackey, Rowe and Gabrieli2018a, Reference Romeo, Segaran, Leonard, Robinson, West, Mackey, Yendiki, Rowe and Gabrieli2018b; Romeo, Leonard, Grotzinger, Robinson, Takada, Mackey, Scherer, Rowe, West & Gabrieli, Reference Romeo, Leonard, Grotzinger, Robinson, Takada, Mackey, Scherer, Rowe, West and Gabrieli2021). However, it is currently unknown whether turn-taking in the preschool years (4-6 years) can be linked to consistent parental use of parentese and/or turn-taking in infancy.
Taken together, the developmental cascades model supports the idea that consistent caregiver use of frequent parentese and/or turn-taking in infancy could serve as a stepping stone to language, catalyzing positive developmental cascades, and supporting longer term robust language development beyond toddlerhood. However, no studies thus far have examined whether consistent parental use of parentese and/or turn-taking in infancy is prospectively associated with children’s later language outcomes, such as grammatical complexity and lexical diversity at Kindergarten entry, or parent-child conversational exchanges at Kindergarten entry, the age at which such exchanges have been shown to predict children’s cognitive skills, brain structure and brain function.
The present study
In the present study, we examine whether consistent parental use of parentese and/or turn-taking in infancy is associated with children’s language complexity and parent-child turn-taking at Kindergarten entry (age 5 years). To answer this question, we collected longitudinal data from 44 children that included:
-
1) In infancy (from 6-24-months of age):
measures of parental input quantity (adult word counts, AWC);
measures of parentese;
measures of parent-infant turn-taking;
measures of infants’ spoken language,
from daylong home language recordings collected at 6, 10, 14, 18 and 24 months;
-
2) From the same 44 children at the age of 5 years:
measures of parent-child turn-taking;
measures of grammatical complexity (utterance length);
measures of lexical diversity, manually coded from daylong home language recordings.
Based on previous research (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020; Gilkerson et al., Reference Gilkerson, Richards, Warren, Oller, Russo and Vohr2018), we hypothesized that consistency in parental use of parentese and turn-taking, but not their overall volume of speech (AWC), would predict measures of speech complexity and parent-child turn taking at Kindergarten entry. By testing these hypotheses, the goal of the present study is to highlight specific aspects of early interactions associated with later language success.
Methods
Participants
Seventy-nine families were recruited through the University of Washington Subject Pool as part of a previously published home language intervention study between 6 and 24 months of age (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018; recruitment period: September 2016–January 2017). In this intervention, families provided informed consent and were then randomly assigned to either a Coaching Group or a passive Control Group when the babies were 6 months of age. The intervention parents then received “coaching” to enhance their use of parentese speaking style and turn-taking between 6 and 18 months of age, while the control group did not receive such coaching. All families completed audio recordings of their children and environment at five time points in infancy (when infants were 6, 10, 14, 18, and 24 months old). The recordings were employed with the LENA system (Version 3.4.0; LENA, 2015), which provides audio recordings and measures of different components in children’s natural environments (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020; Ferjan Ramírez, Hippe & Kuhl, Reference Ferjan Ramírez, Hippe and Kuhl2021). All parents in this study (Coaching Group parents and Control Group parents) used parentese and turn-taking between 6 and 24 months of age, though their frequency varied from family to family, and was enhanced in the Coaching Group (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020). About 2.5 years after completing the infancy home language intervention, families in both the Coaching and Control groups who agreed to be re-contacted for future research were invited to return for a new, follow-up LENA study when their child was 5 years old (re-recruitment period: February–June 2021). All families who agreed to participate in the follow-up (N=70) completed a phone screening interview to determine whether their children met the following criteria:
-
(1) Child not yet enrolled in Kindergarten, aged between 5 years and 5 years and 4 months;
-
(2) English is still the primary language of communication in the home;
-
(3) Child had no apparent congenital, neurological or physical abnormalities.
Exclusion criteria included any brain injury and medications that impact cognition; intellectual disability; Autism Spectrum Disorder; mood disorders; significant and permanent hearing impairments. After the initial screening process, 52 eligible participants were invited to take part in the follow-up study. Of these, 49 families completed both days of the follow-up LENA recording at 5 years. Of these 49 families, 44 also had all infancy language recordings (i.e., at 6, 10, 14, 18, and 24 months, with no missing data), and were included in the present sample. Of these 44 families (25 with girls, 19 with boys), 33 were in the “Coaching Group” in the Infancy Intervention, and 11 were in the “Control Group” (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020). Note that data collection at age 5 years was not pre-planned (i.e., families had to be re-contacted and re-enrolled). As might be expected, this resulted in the present study enrolling a disproportionally low number of participants who were part of the “Control Group” in the Infancy intervention, parents who were contacted less over the course of the study due to no coaching (i.e., 11 participants, 25% of the sample from the “Control Group”, vs. 33 families, 75% of the sample from the original “Coaching Group”). Importantly however, the two parental language behaviors that were manipulated by the Infancy Intervention (parentese and turn-taking between 6 and 24 months) did not differ between the 33 participating children enrolled in the infancy study as “Intervention” and the 11 participating children originally enrolled in the infancy study as “Control” (both ps>.05), even though the same two behaviors were demonstrated to be enhanced by the Infancy Intervention when the whole sample of 77 participants was considered (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020). For this reason, and because the goal of the present study was to examine the correlation between language input in infancy and language outcomes at age 5 years, the data were analyzed for the whole available sample as a single group of 44 families. Socio-economic status (SES) was measured with the Hollingshead Index and ranged from 30 to 66 in the final analyzed sample, (M = 49.4, SD = 10.5) (i.e., working- to upper-middle class families).
Data collection, preparation, and annotation
The home Language Environment Analysis System (LENA) was used to collect naturalistic first-person recordings from all families over two weekend days when children were 6-, 10-, 14-, 18-, 24- months old (Infancy) and then again at 5 years of age. At each timepoint, all families received two LENA recorders in the mail and were instructed to use one recorder on each day of a “typical” weekend, defined as two consecutive days when both parents were home and not working. Parents were asked to start each recording in the morning when the child woke up, go about their day as usual, and turn off the recorder at night when the child went to sleep. Recordings across all ages varied in length between 8:00 and 16:00 hours, with an average of 12 hrs and 47 mins at 6 months, 13 hrs and 4 mins at 10 months, 13 hrs at 14 months, 12 hrs and 42 minutes at 18 months, 12 hours and 50 minutes at 24 months, and 13 hours and 17 minutes at 5 years.
Parent and child speech were quantified through a combination of automatic annotation by LENA software and manual (human) annotation. The LENA software produces an automatic count of child vocalizations (child vocalization count, CVC), words produced by nearby adults (adult word count, AWC), and adult-child conversational turns (conversational turn count, CTC). Recent studies have sought to assess and validate LENA’s classification performance (e.g., Cristia, Bulgarelli & Bergelson, Reference Cristia, Bulgarelli and Bergelson2020; Lehet, Arjmandi, Houston & Dilley, Reference Lehet, Arjmandi, Houston and Dilley2020; Wang, Williams, Dilley & Houston, Reference Wang, Williams, Dilley and Houston2020). According to one meta-analysis, LENA achieves a mean recall and precision of 0.59 and 0.68, respectively, for recognizing adult words and a mean recall of 0.77 for recognizing child vocalizations (Cristia et al., Reference Cristia, Bulgarelli and Bergelson2020). LENA’s CTC measure looks for adult and child speech in close temporal proximity – but, critically, without differentiating between child-directed and overheard speech. This means that an unknown proportion of LENA’s CTCs are identified in error, such as when a parent is talking on the phone and the infant is babbling to herself nearby (i.e., accidental contiguity). The frequency of accidental contiguity has recently been shown to be high (Ferjan Ramírez et al., Reference Ferjan Ramírez, Hippe and Kuhl2021), leading us to limit our analysis of turn-taking to manually identified conversational turns. However, we do rely on LENA’s automatic identified AWC and CVC (see Table 1 for definitions). AWCs are used to approximate the amount of adult speech heard by the infant, and CVCs are used as a measure of child volubility in infancy. Note that AWC is an estimate of all speech occurring near the child wearing the recorder, including all adult speech directed to the child and all adult speech overheard by the child. CVC is an estimate of any articulations that originate from the vocal tract of the child, except for fixed signals (screams, cries), sounds related to respiration (breaths) or digestion (burps; see Xu, Richards & Gilkerson, Reference Xu, Richards and Gilkerson2014; Gilkerson et al., Reference Gilkerson, Richards, Warren, Montgomery, Greenwood, Oller, Hansen and Paul2017 for a more detailed description of these metrics). Early communication skills are a strong predictor of later language ability, so CVC was considered in order to explore whether children’s own volume of speech-related vocalizations between 6 and 24 months may predict later language outcomes. However, note that CVC is a purely quantitative measure that does not differentiate between different kinds of language-related vocalizations that infants are known to produce between 6 and 24 months (i.e., a vocalization such as “ba” and a full multi-word utterance such as “I want a cookie” would both be counted as one CV). A higher CVC value thus indicates a higher volume of child speech, but not necessarily more complex speech.
The LENA audio files were further processed using the Advanced Data Extractor Tool (ADEX) for the purposes of manual annotation. For identification of parentese (6-24 months) and conversational turns (6-24 months and 5 years), the same procedures were followed as in previously published studies (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020, 2021; Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2016, Reference Ramírez-Esparza, García-Sierra and Kuhl2017). In brief, ADEX was used to identify intervals with the language activity of interest (high AWC), in order to avoid coding when there is no social or linguistic activity (for example, during naps). Each participant’s two daily recordings were segmented into 30-second intervals. For each of the two recording days, 50 intervals with the highest adult word count that were at least 3 minutes apart were selected, yielding a total of 100 30-second coding intervals per participant. Ten research assistants listened to each 30-second interval and determined the presence or absence of parentese speaking style, and counted the number of conversational turns (CTs), using the same audio files, training, and reliability assessment as described by Ramírez-Esparza and colleagues (Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2016, Reference Ramírez-Esparza, García-Sierra and Kuhl2017) and Ferjan Ramírez and colleagues (Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020, Reference Ferjan Ramírez, Hippe and Kuhl2021). Note that child-directed speech consists of parentese and standard speech. The focus of the present study is on the long-term effects of parentese, which is distinguished from standard child-directed speech by its acoustic features. To identify parentese and distinguish it from standard child-directed speech, the same criteria were adopted as described previously by Ramírez-Esparza and colleagues (Reference Ramírez-Esparza, García-Sierra and Kuhl2014, Reference Ramírez-Esparza, García-Sierra and Kuhl2016, Reference Ramírez-Esparza, García-Sierra and Kuhl2017) and Ferjan Ramírez and colleagues (Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020, Reference Ferjan Ramírez, Hippe and Kuhl2021): Ramírez-Esparza et al. (Reference Ramírez-Esparza, García-Sierra and Kuhl2014) independently verified that the intervals defined as parentese or standard speech contained the acoustic differences characteristic of these two speech styles (i.e., higher pitch and larger pitch range for parentese). In these analyses, 60 occurrences of the word ‘you’ were analyzed. The 60 occurrences of ‘you’ represented 30 pairs (30 produced as parentese and 30 as standard speech) produced by the same adult addressing the same infant. Mean pitch and pitch range were significantly higher for parentese than standard speech (ps < 0.001); see Table 1 for variable definitions).
For tabulating the number of CTs within each 30-s segment, the same procedures were followed as in Ferjan Ramírez et al. (Reference Ferjan Ramírez, Hippe and Kuhl2021). In brief, as with the LENA algorithm, CTs were counted in discrete pairs, and pauses of 5s or more constituted the end of a conversation. Critically, and unlike the LENA algorithm, cases of accidental contiguity were not counted as CTs. The total number of CTs was then counted across all 100 intervals for each participant. After training, all coders were tested independently with a training file from Ramírez-Esparza and colleagues (Reference Ramírez-Esparza, García-Sierra and Kuhl2014); and a training file from the present dataset, used to evaluate inter-coder reliability. The reliability analysis produced an average intra-class correlation of 95% for parentese and 97% for CTs. This indicates effective training and reliable coding.
Language complexity at 5 years was assessed by measuring children’s mean length of utterance (MLU) in morphemes, and lexical diversity (LexDiv) by manually annotating the recordings collected at 5 years of age. The following procedures were used: for each child, both 5-year daylong audio recordings were segmented into one hour-long segments. Using ADEX, the hours from both days were then arranged from highest to lowest in CVC. Recall that CVC represents an estimated number of meaningful child speech utterances of any length, and can include babbling, individual words, or sentences. The hour with the highest CVC was used for analyses of MLU and LexDiv in order to be able to consistently obtain a sample of 100 consecutive utterances for analyses from each child.
Nine research assistants, all native speakers of English, transcribed the dataset. They were first trained on the protocol, transcribed two training files, and received feedback on their transcription from their supervisor, the third author of the manuscript (K. Sheth). Within the highest CVC hour selected for transcription, the utterances were segmented into communication units (C-units; Miller, Andriacchi & Nockerts, Reference Miller and Ulrich2019), which include one main clause with all subordinate clauses and cannot be further divided without losing its meaning. For example, [He went to the store because he was out of milk] would be considered one C-unit, but [He climbed up on the branches] [but they weren’t branches] would be considered two C-units were chosen as a consistent way for segmenting utterances across all participants. Following the standard Systematic Analysis of Language Transcripts (SALT) conventions (Miller et al., Reference Miller, Andriacchi and Nockerts2019), the first 100 C-units in the highest CVC hour were transcribed for each participant. Within these 100 C-units, transcribers also completed a morpheme transcription, marking the bound morphemes as specified in the standard SALT Conventions (Miller et al., Reference Miller, Andriacchi and Nockerts2019). For example, to indicate regular plural inflections, the bound morpheme /s was used (EX: frog/s, tree/s). To indicate third person singular verb inflections, the bound morpheme /3s was used (EX: he look/3s, she jump/3s). Each 100-utterance transcription was then checked by a second transcriber and marked for any disagreements or errors. The transcription supervisor (K. Sheth) then resolved any disagreements. Utterances with excessive noise that were impossible to reliably transcribe were removed from analyses. In two participants, the entire highest CVC hour was too noisy to allow for reliable transcription (i.e., lots of overlapping voices from other children). In both cases, the second highest CVC hour was used for analyses instead.
For each participant, the 100 transcribed C-units were then further analyzed to derive the MLU in morphemes and LexDiv. MLU was calculated using the morpheme transcription and utilized formulas in Microsoft Excel to get the average number of morphemes per C-unit across the 100 C-units for each participant. LexDiv represents the average number of unique lexical types across all 100 C-units per participant, and was calculated utilizing a Python (Version 3.8) (Van Rossum & Drake, Reference Van Rossum and Drake2009) script with Natural Language ToolKit (NLTK) (Bird, Loper & Klein, Reference Bird, Loper and Klein2009). For example, he goes counts as two unique words. A subsequent occurrence of he or goes in the same participant would not be counted as additional unique lexical items, but words such as go, went, him, or his would count as additional unique lexical items. What this means is that our measure of unique lexical types refers to the words and not the lemmas.
Because the LENA recordings varied in duration, projected 12-h values were used for all LENA automatic measures (AWC, CVC). The 12-h projections are generated automatically for recordings at least 10h in length, and represent the interpolated values for AWC and CVC at the 12-h mark for the day’s recording. The values for “manual” variables are based on 100 30-s segments for parentese and CTC, and on 100 consecutive utterances for MLU and LexDiv, annotated by humans as described above.
Results
Descriptive statistics
The descriptive statistics for all raw data can be found in Table 2. Across all infancy timepoints (6, 10, 14, 18, and 24 months), infants heard an estimated average of 16,424 words per day, and produced an average of 1,747 vocalizations per day. An average of 54% of the coded intervals contained parentese speaking style.
We created a correlation matrix between parentese, CTC, and AWC measured at 6, 10, 14, 18, and 24 months, and the three 5-year outcome measures: CTC, LexDiv and MLU (Table 3). We use α = 0.05 as denoting statistical significance, and α = 0.1 as denoting marginal significance. With the sample size of 44 and a 0.8 power, both of these α values allow us to reliably capture medium to large effect sizes (Serdar, Cihan, Yücel & Serdar, Reference Serdar, Cihan, Yücel and Serdar2021).
Parental parentese at 6 months, 10 months, 14 months, and 18 months was significantly correlated with 5-year CTC. At 24 months, the correlation between parental parentese and 5-year CTC was marginally significant. Further, parental parentese at 18 months and 24 months was significantly correlated with 5-year LexDiv. At 6 months, the correlation between parental parentese and 5-year LexDiv was marginally significant. Finally, parental parentese at 18 months was significantly correlated with 5-year MLU. At 6 months and 14 months, the correlation between parental parentese and 5-year LexDiv was marginally significant.
Because parental use of parentese was significantly or marginally significantly correlated with all three 5-year measures at one or more of the timepoints (Table 3), and one of our main questions pertained to the consistency of parental parentese use, we calculated the “global” infancy parentese score, summing the parentese values across all 5 ages (6, 10, 14, 18, and 24 months, Table 2), which serves here as our proxy for “parentese consistency” between 6 and 24 months. To examine the association between parentese consistency between 6-24 months of age and 5-year language measures, we ran unadjusted linear regression models, including the “global parentese” variable, and the three 5-year measures (CTC, LexDiv, MLU). Next, adjusted models were run, which included child volubility across all ages in infancy (“global CVC”, Table 2) as a potential predictor of the 5-year measures. This step was done to examine whether children’s own volubility of speech related vocalizations in infancy may predict later language outcomes.
CTC at 18 and 24 months was significantly correlated with 5-year CTC. The correlation between 10 month CTC and 5-year CTC was marginally significantly (Table 3). There were no significant correlations between CTC in infancy and MLU or LexDiv at 5 years. To consider the hypothesis that consistent use of turn-taking in infancy would predict parent-child turn taking at Kindergarten entry, we considered the association between CTC summed across all 5 ages in infancy (“global CTC”; Table 2) and 5-year CTC, first in an unadjusted model, and then in adjusted models. CTC was not included in the models predicting LexDiv and MLU due to a lack of correlation between CTC and these two variables.
AWC at any one of the timepoints measured in infancy was not significantly or marginally significantly correlated with any one of the 5-year measures (Table 3), and was therefore not included in any one of the models. SES was also not significantly correlated with any one of the 5-year measures (all ps>.26), and was not included in the models.
Predicting 5-year CTC
Across all participating children, the mean number of conversational turns in 100 30-s segments at age 5 years was 168.52 (SD = 62.66, Table 2). The correlation between global parentese and 5-year CTC was significant (r = 0.45, p = 0.002), as was the correlation between global CTC and 5-year CTC (r = 0.34, p = 0.024). To evaluate whether these correlations could be accounted for by the child’s own language characteristics, we added simultaneously collected, potentially related measures of child language volubility (global CVC). Global CVC was not significantly correlated with global parentese (p = 0.55), but was significantly correlated with global CTC, (r = 0.40, p = 0.008).
In Table 4, we display regression metrics for 6 models to predict turn-taking at age 5 years: Model 1: global parentese only; Model 2: global CTC only; Model 3: global CVC only; Model 4: global parentese plus global CTC; Model 5: global parentese plus global CVC, Model 6: global parentese plus global CTC plus global CVC. Note that we are comparing models with different numbers of independent variables. In this scenario, R-squared will increase every time an independent variable is added to the model (i.e., it never declines). By contrast, adjusted R-squared increases only when independent variable is significant and affects the dependent variable. Therefore, R-squared is used to judge which model is “best” (Schroeder, Sjoquist & Stephan, Reference Schroeder, Sjoquist and Stephan2016). The best model in Table 4 is Model 1 (only global parentese), which demonstrates that 18.4% of the variance in 5-year CTC can be explained by parental use of parentese between 6 and 24 months of age (Table 4, Model 1). Adding CTC and/or CVC to the model did not improve the predictive power of global parentese (Table 4, Models 2-6).
Predicting 5-year Lexical Diversity
Across all participating children, the mean number of different lexical items produced in 100 consecutive utterances was 175.3 (SD = 24.24; Table 2). Global parentese correlated significantly with 5-year LexDiv (r = 0.326, p = 0.031). However, results from multiple linear regression analyses predicting 5-year LexDiv scores from global parentese when controlling for CVC revealed that its addition to the model significantly improved the predictive power of global parentese. In Table 5, we display regression metrics for 3 models: Model 1: global parentese only; Model 2: global CVC only; Model 3: global parentese and global CVC. The best model is Model 3, which combines global parentese with CVC, and demonstrates that 17.2% of the variance in Lexical Diversity at age 5 years can be explained by a combination of parental parentese and child volubility between 6 and 24 months of age (Table 6, Model 3). Note, however, that the global CVC’s standard coefficient is negative, meaning that children whose overall speech volubility between 6 and 24 months was lower had higher lexical diversity at age 5 years (see Discussion).
Predicting 5-year MLU
Across all participating children, the mean MLU in morphemes was 4.75 (SD = 0.79; Table 2). Global parentese correlated significantly with 5-year MLU (r = 0.314, p = 0.038). Results from multiple linear regression analyses predicting 5-year MLU scores from global parentese when controlling for CVC revealed that its addition to the model significantly improved the predictive power of global parentese. In Table 6, we display regression metrics for 3 models: Model 1: global parentese only; Model 2: global CVC only; Model 3: global parentese and global CVC. The best model is Model 3, which combines global parentese with CVC, and demonstrates that 21.7% of the variance in MLU at age 5 years can be explained by a combination of parental parentese and child volubility between 6 and 24 months of age (Table 6, Model 3). Note that the global CVC’s standard coefficient is negative, meaning that children whose overall speech volubility between 6 and 24 months was lower had higher MLU at age 5 years
Discussion
The present study tested the hypothesis that consistent use of parentese and/or conversational turns in infancy predicts children’s conversation and language outcomes at the age of 5 years, just prior to Kindergarten entry. Supporting our hypothesis, our results show that, at the age of 5 years, children whose parents consistently used high amounts of parentese in infancy demonstrated higher lexical diversity, produced longer sentences, and engaged in higher rates of turn-taking compared to children whose parents used lower amounts of parentese in infancy. Also confirming our hypotheses, the sheer quantity of adult speech in infancy (i.e., AWC) was not associated with children’s conversation or language complexity outcomes at the age of 5 years. Together, these findings support the notion that language input in infancy is the foundation of later language skills (Hirsh-Pasek et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Rodriguez & Tamis-LeMonda, Reference Rodriguez and Tamis‐LeMonda2011; Rowe, Reference Rowe2012), and that associations between qualitative aspects language input and later child language skills may be stronger than those between the sheer quantity of adult speech and child language skills (Hirsh-Pasek et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Zimmerman, Gilkerson, Richards, Christakis, Xu, Gray & Yapanel, Reference Zimmerman, Gilkerson, Richards, Christakis, Xu, Gray and Yapanel2009). We extend previous findings by proposing that parentese may be a key component of high-quality language input in infancy, especially when delivered at the age at which infants are known to be sensitive to the acoustic properties of the speech signal (Kuhl, Reference Kuhl2004), but may not yet be producing large quantities of speech themselves.
Given the present results, it is important to consider how and why parentese may be helpful for language learning in infancy and beyond. With its exaggerated acoustics, unique syntactic structure, accompanying social behaviors, and exaggerated facial movements (Werker, Pegg & McLeod, Reference Werker, Pegg and McLeod1994), parentese conveys a positive emotion that makes the speaker sound “happy” (Singh et al., Reference Singh, Morgan and Best2002). This combination attracts and holds infants’ attention to the speaker and what they are saying, giving infants ample time to babble in response (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020). Through such exchanges, which may at first be sparse, and primarily driven by parental parentese, infants adjust their vocalizations when they respond to parental talk, and increase the complexity of their language (Bornstein et al., Reference Bornstein, Tamis-LeMonda and Haynes1999; Braarud & Stormark, Reference Braarud and Stormark2008; Goldstein & Schwade, Reference Goldstein and Schwade2008; Smith & Trainor, Reference Smith and Trainor2008). Parents, in turn, provide contingent feedback that is continuously adjusted to their child’s linguistic needs, thereby creating a positive feedback loop that further promotes language growth through back and forth exchanges in toddlerhood and through the preschool years (Warlaumont et al., Reference Warlaumont, Richards, Gilkerson and Oller2014). The quantity of parent-child conversational turns between 4-6 years of age, a variable corresponding closely to our 5-year CTC, has previously been linked to children’s cognitive performance, as well as the functional and structural measures of language-related brain networks (Romeo et al., Reference Romeo, Leonard, Robinson, West, Mackey, Rowe and Gabrieli2018a, Reference Romeo, Segaran, Leonard, Robinson, West, Mackey, Yendiki, Rowe and Gabrieli2018b, 2021). Our findings demonstrate that parental use of parentese between 6 and 24 months accounted for 18.4% of variance in 5-year CTC, suggesting the important role of parentese in infancy in establishing a strong communication foundation prior to Kindergarten entry.
The present study also demonstrates a link between parentese in infancy, particularly between 18 and 24 months, and lexical diversity and sentence length at age 5 years. Parentese may facilitate lexical acquisition by exaggerating acoustical properties such as linguistic focus. For example, mothers of one-year old infants have been shown to highlight target words by placing an exaggerated pitch peak at sentence ends (Aslin, Woodward, LaMendola & Bever, Reference Aslin, Woodward, LaMendola, Bever, Morgan and Demuth1996; Fernald & Mazzie, Reference Fernald and Mazzie1991). Parentese may also be helpful for identifying words because of a preponderance of prosodically isolated single words (Brent & Siskind, Reference Brent and Siskind2001). For example, Brent and Siskind found that these words cross the spectrum of grammatical categories, and are a significant predictor of vocabulary acquisition. Another possible role for phonological properties of parentese is in classifying words by grammatical category, which may be related to the presently-identified link between parentese in infancy and MLU at age 5 years. For example, Shi, Morgan, and Allopena (Reference Shi, Morgan and Allopena1998) found that phonological and acoustic properties of parentese differentiate content and function words, and perceptual experiments have shown that even newborns can differentiate these word categories (Shi, Werker & Morgan, Reference Shi, Werker and Morgan1999). Furthermore, some researchers have proposed that some acoustic properties that are exaggerated in parentese are associated with syntactic boundaries (Kemler Nelson, Hirsh-Pasek, Jusczyk & Cassidy, Reference Kemler Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989). In support of this notion, laboratory studies have demonstrated that infants use these cues to group words into syntactically-relevant sequences (Mandel, Jusczyk & Kemler Nelson, Reference Mandel, Jusczyk and Kemler Nelson1994; Mandel, Kemler Nelson & Jusczyk, Reference Mandel, Kemler Nelson and Jusczyk1996; Nazzi, Kemler Nelson, Jusczyk & Jusczyk, Reference Nazzi, Kemler Nelson, Jusczyk and Jusczyk2000).
Importantly, for both lexical diversity and utterance length, our results indicate that the predictive power of parentese was improved when the child’s own volubility was added to the models. Specifically, parentese and infant language volubility between 6 and 24 months of age together accounted for 17.2% and 21.7% of lexical diversity and MLU at 5 years respectively. Of note, infant speech volubility contributed in the negative direction (i.e., children whose overall volume of speech between 6 and 24 months was lower had higher lexical diversity and MLU at age 5 years). One potential interpretation of this finding is that associations between parentese and later language complexity may be particularly strong in children who are less “chatty” (i.e., children who are not (yet) producing large quantities of speech themselves). It may be that parentese acts as a mechanism that “drives” infants into later conversational exchanges, and especially so if their own volume of talk is low; future studies with larger numbers of infants who vary in their levels of “chattiness” will be necessary to further unpack this relation. Particularly interesting would be studies using methodologies that capture non-verbal interactions between parents and children, in addition to their verbal exchanges. For example, studies have found that parental contingent comments during passive joint engagements (when infants do not produce speech themselves, but are looking at the same object as the parent) were positively associated with children’s later language outcomes (Rollins, Reference Rollins2003; Trautman & Rollins, Reference Trautman and Rollins2006). Similarly, infants are known to activate their motor brain areas in distinct ways to speech compared to non-speech sounds several months prior to generating intelligible speech (Kuhl, Ramírez, Bosseler, Lin & Imada, Reference Kuhl, Ramírez, Bosseler, Lin and Imada2014), suggesting that they are creating internal speech motor models of native language in response to parental speech, even when they may not yet be talking themselves. Furthermore, it is well documented that infants in the age range studied here (6-24 months) often use gesture to reveal knowledge that they cannot yet express in speech (Goldin-Meadow, Reference Goldin‐Meadow2009). In fact, gestures in infancy are known to selectively predict children’s lexical skills at age 3.5 years, even with early child speech controlled (Rowe & Goldin-Meadow, Reference Rowe and Goldin-Meadow2009a, Reference Rowe and Goldin-Meadow2009b). Furthermore, previous research has found that child gesture types, rather than child gesture volume (frequency), predicts later spoken language vocabulary size (Rowe, Ozçalişkan & Goldin-Meadow, Reference Rowe, Ozçalişkan and Goldin-Meadow2008). Along similar lines, it may be the case that children’s later language skills are predicted by vocalization types in infancy, which were not measured in the present study (recall that CVC is a purely quantitative measure). It is also possible that maternal and paternal parentese (analyzed in the present study together as “parental” parentese) are in fact associated with infant vocalizations in distinct ways (Shapiro et al., Reference Shapiro, Hippe and Ferjan Ramírez2021). Finally, it is important to acknowledge that naturalistic daylong recordings do not control for contextual or conversational style differences, both of which have been shown to play a role in shaping how language input benefits child language growth (Crain-Thoreson, Dahlin & Powell, Reference Crain-Thoreson, Dahlin and Powell2001; Yoder & Kaiser, Reference Yoder and Kaiser1989). For example, conversational styles during toy play, dressing, feeding, and book reading are known to differ in their rates of behavior directives vs conversation eliciting utterances (Hoff-Ginsberg, Reference Hoff-Ginsberg1991), and different types of parental utterances have been shown to predict different emergent language skills (Crain-Thoreson et al., Reference Crain-Thoreson, Dahlin and Powell2001; Reese, Reference Reese1995). Future studies will thus have to unpack the relation between parental parentese and the complexities of children’s linguistic vocalizations between 6 and 24 months, across different contexts, and separating the contributions of various caregivers (i.e., mothers vs fathers; see Shapiro et al., Reference Shapiro, Hippe and Ferjan Ramírez2021).
Contrary to our hypotheses, in the present dataset, consistent turn taking between 6 and 24 months of age was not identified as a significant predictor of any of the 5-year outcomes. This is somewhat surprising, considering that one previous study has linked turn-taking between 18 and 24 months to language outcomes 10 years later (Gilkerson et al., Reference Gilkerson, Richards, Warren, Oller, Russo and Vohr2018). There are multiple possible interpretations for this apparent discrepancy. First, it is important to note that parentese between 6 and 14 months has previously been proposed to enhance turn-taking at 18 months (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020). Considering this finding, it is possible that the predictive power of turn-taking at 18-24 months reported by Gilkerson and colleagues could, in fact, be attributed to parental use of parentese in the first year of life, which was not measured in that study. That is, parentese may act as a mechanism that initially “drives” infants to engage in conversational exchanges, which have been demonstrated to change in quantity and quality somewhere around 18-24 months of age (Gilkerson et al., Reference Gilkerson, Richards, Warren, Oller, Russo and Vohr2018). For example, prior to 18 months of age, children rarely produce combinatorial speech. Then, around 18 months of age, the first word combinations are typically produced, and children’s vocabularies increase rapidly. While the existence of a “word spurt” is debatable (Dapretto & Bjork, Reference Dapretto and Bjork2000; but see Ganger & Brent, Reference Ganger and Brent2004), researchers agree that the rate of word learning and landscape changes in child-language use are observed in the second half of the second year of life (Fromkin, Rodman & Hyams, Reference Fromkin, Rodman and Hyams2013). Correspondingly, normative data collected in children’s naturalistic environments suggest that the rate of turn-taking between parents and children increases rapidly between 18 and 24 months of age (Gilkerson & Richards, Reference Gilkerson and Richards2009), perhaps to the level where turn-taking finally becomes a reliable predictor of later language outcomes. However, our present data, in combination with previous findings (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020), suggest that these enhancements in turn-taking may be partly related to earlier use of parental parentese.
It is also important to point out that conversational turns were measured differently in the study by Gilkerson and colleagues compared to the present study. Specifically, Gilkerson et al. (Reference Gilkerson, Richards, Warren, Oller, Russo and Vohr2018) used LENA’s automatic estimate of turn-taking at the daylong level, while in the present study, conversational turns were annotated manually in 100 30-second segments per participant in order to exclude cases of accidental contiguity, which are abundant in the age range studied here (Ferjan Ramírez et al., Reference Ferjan Ramírez, Hippe and Kuhl2021). However, while manual annotation is more accurate, the downside is that it is limited to only a portion of the daylong recording, and as such may not be sensitive enough to capture significant relations with later language outcomes. Enhancements in conversational turns in infancy have previously been linked to larger vocabularies in infancy (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020). Furthermore, turn-taking in infancy has recently been proposed to shape the circuitry of language-related brain networks at age 26 months (Huber, Corrigan, Yarnykh, Ferjan Ramírez & Kuhl, Reference Huber, Corrigan, Yarnykh, Ferjan Ramírez and Kuhl2023), suggesting that the physiology facilitating the back-and forth interactions is indeed being set up in infancy. Future studies using larger datasets will have to look further into conversational turns and their associations with behavioral measures of language and cognition at Kindergarten entry and beyond. However, the results of the present study, along with previous research (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle and Kuhl2020; Kuhl, Reference Kuhl2004), suggests that parentese may be the key to igniting “conversation” in infancy, toddlerhood, and beyond.
Finally, it is important to acknowledge that, while the consistency of high-quality parental input across the first two years of life may be an important contributing factor to Kindergarten language skills, the precise nature of high quality parental language behaviors may change over the course of development. For example, Rowe, (Reference Rowe2012) has previously demonstrated that using a diverse and sophisticated vocabulary may be particularly beneficial for toddlers, while using decontextualized language may be particularly beneficial for preschoolers. Likewise, the present dataset suggests that parental use of parentese may have differential effects within the infancy period studied here (6 to 24 months). The links between the exact timing at which parental behaviors have particularly strong associations with specific child language outcomes will have to be further investigated with carefully designed interventions that enroll a sufficiently high number of participants in a single study spanning an age range from infancy to Kindergarten. Unfortunately, this was not possible here, due to the exploratory nature of the present study (i.e., data collection at 5 years was not pre-planned), and due to its correlational design. Future pre-planned and pre-registered studies with higher numbers of participants will also allow for more stringent α levels. In the present study, we use α = 0.05 as denoting statistical significance, and α = 0.1 as denoting “marginal significance”, allowing us to reliably capture medium to large effect sizes. It is important to acknowledge that, in most fields, α = 0.05 has been used as the gold standard cutoff denoting statistical significance (Miller & Ulrich, Reference Miller and Ulrich2019), although researchers agree that this cutoff is arbitrary, that there are good reasons to believe that no single α level is optimal for all research, and that there are certainly contexts, such as exploratory or preliminary studies, where higher α levels can be appropriate (Michaels, Reference Michaels2017; Serdar et al., Reference Serdar, Cihan, Yücel and Serdar2021). We chose α = 0.1 to denote marginal significance because of the exploratory nature of our study (see Serdar et al., Reference Serdar, Cihan, Yücel and Serdar2021). While this inevitably leads to an increased likelihood of a false positive (i.e., detecting an effect when it is not there in the full population), it also decreases the likelihood of a false negative (i.e., not detecting an effect when it is there in the full population), which we consider important, given the scarcity of previous research on the long-term effects of parentese.
In addition to the exploratory nature and use of a correlational design in the present study, there are other limitations that should be considered. The current sample was originally part of a longitudinal, randomized control study designed to establish the efficacy of a parent coaching intervention for enriching parental language input (Ferjan Ramírez et al., Reference Ferjan Ramírez, Lytle, Fish and Kuhl2018, Reference Ferjan Ramírez, Lytle and Kuhl2020). The goal of the Intervention was to examine the effect of changing parents’ behavior, while holding other factors constant across participants. Therefore, the sample was intentionally homogenous and included children raised by predominantly White, English-speaking mothers and fathers, within middle to upper-middle SES households in Washington state. While this helped to assure that the Coaching and Control groups were closely matched through the first 24 months of age, it is important to acknowledge that this demographic may exhibit different patterns of language input compared to families who are not represented in the current study. Intriguingly, recent cross-cultural studies suggest that non-Western children attain developmental milestones on a similar timeline as Western children, even in cultures where parentese use and turn-taking between parents and children may be rare (see Casillas, Brown & Levinson, Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021). It has also been argued that parentese and turn-taking between caregivers and children are simply a teaching model preferred by middle- and upper-SES families (Avineri, Johnson, Brice‐Heath, McCarty, Ochs, Kremer‐Sadlik, Blum, Zentella, Rosa, Flores, Alim & Paris, Reference Avineri, Johnson, Heath, McCarty, Ochs, Kremer‐Sadlik, Blum, Zentella, Rosa, Flores, Alim and Paris2015; Sperry, Sperry & Miller, Reference Sperry, Sperry and Miller2018). However, back-and-forth conversational interactions between caregivers and children have been shown to predict language outcomes within socio-economic groups, including lower income samples (see Golinkoff et al., Reference Golinkoff, Hoff, Rowe, Tamis-LeMonda and Hirsh-Pasek2019; Hirsh-Pasek, et al., Reference Hirsh-Pasek, Adamson, Bakeman, Owen, Golinkoff, Pace, Yust and Suma2015; Masek, Paterson, Golinkoff, Bakeman, Adamson, Owen, Pace & Hirsh-Pasek, Reference Masek, Paterson, Golinkoff, Bakeman, Adamson, Owen, Pace and Hirsh-Pasek2021). Additionally, rates of interactive speech have been shown to vary with rates of child-initiated speech, child vocabulary size, and processing speed (Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012; Weisleder & Fernald, Reference Weisleder and Fernald2013) in addition to child-initiated communicative behavior (Salomo & Liszkowski, Reference Salomo and Liszkowski2013) within and across cultures, arguing that differences in qualitative aspects of caregiver input still affect the sophistication of child language. Nevertheless, future research should assess the use of parentese and turn-taking and their predictive power in more diverse populations, such as non-White families, multilingual and non–English-speaking households, single-parent families, and families with same-sex parents.
In summary, the present study provides evidence that parents’ consistent use of the speech style that is elicited naturally at home when parents speak to their children – parentese – can predict long-term language outcomes in children when they reach the age of 5 years, while the sheer quantity of language input cannot. Our findings suggest that consistent parental use of parentese speech produces highly robust patterns of association with future language across a large age range, from 6 months of age to 24 months of age. This buttresses the idea that language skills are malleable, and that parentese speech in the natural social context in which it is delivered may present an ideal catalyst for language learning, a fact that needs to be incorporated into theoretical accounts of language acquisition, and one with implications for parents and society.
Acknowledgements
This study was supported by the Overdeck Family Foundation and the University of Washington’s Language Acquisition and Multilingualism Endowment. The authors thank Lili Correa, Julia Mizrahi, Denise Padden and Neva Corrigan for valuable assistance throughout study design, data collection, and analyses.
Competing interests
The authors declare none