Relating the prosody of infant-directed speech to children’s vocabulary size

Mengru HAN; Nivja H. DE JONG; René KAGER

doi:10.1017/S0305000923000041

Relating the prosody of infant-directed speech to children’s vocabulary size

Published online by Cambridge University Press: 09 February 2023

Mengru HAN

Nivja H. DE JONG and

René KAGER

Show author details

Mengru HAN: Affiliation:
Department of Chinese Language and Literature, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China Utrecht Institute of Linguistics (OTS), Utrecht University, Trans 10, 3512 JK Utrecht, the Netherlands Language, Cognition, and Evolution Lab, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
Nivja H. DE JONG: Affiliation:
Leiden University Center for Linguistics (LUCL), Leiden University, Van Wijkplaats 4, 2311 BX Leiden, the Netherlands Leiden University Graduate School of Teaching (ICLON), Leiden University, Kolffpad 1, 2333 BN Leiden, the Netherlands
René KAGER*: Affiliation:
Utrecht Institute of Linguistics (OTS), Utrecht University, Trans 10, 3512 JK Utrecht, the Netherlands
*: *Corresponding author. René Kager, E-mail: R.W.J.Kager@uu.nl

Article contents

Abstract
Introduction
Method
Results
Discussion and conclusions
Supplementary Materials
Footnotes
References

Rights & Permissions

Abstract

This study examines correlations between the prosody of infant-directed speech (IDS) and children’s vocabulary size. We collected longitudinal speech data and vocabulary information from Dutch mother-child dyads with children aged 18 (N = 49) and 24 (N = 27) months old. We took speech context into consideration and distinguished between prosody when mothers introduce familiar vs. unfamiliar words to their children. The results show that IDS mean pitch predicts children’s vocabulary growth between 18 and 24 months. In addition, the degree of prosodic modification when mothers introduce unfamiliar words to their children correlates with children’s vocabulary growth during this period. These findings suggest that the prosody of IDS, especially in word-learning contexts, may serve linguistic purposes.

Keywords

Infant-directed speech prosody lexical development

Type: Article
Information: Journal of Child Language , Volume 51 , Issue 1 , January 2024 , pp. 217 - 233

DOI: https://doi.org/10.1017/S0305000923000041 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Introduction

Child language acquisition during the first years of life benefits from a rich language environment. Recent literature has seen increased interest in understanding exactly which qualitative aspects of language input are relevant to children’s language outcomes (Blom & Soderstrom, Reference Blom and Soderstrom2020). Infant-directed speech (IDS) is an important type of input in children’s early language development. Compared to adult-directed speech (ADS), IDS is primarily characterized by its relatively exaggerated prosody (see reviews in Cristia, Reference Cristia2013; Soderstrom, Reference Soderstrom2007). The prosody of IDS has often been hypothesized to be beneficial to language acquisition, but whether there is a relationship between IDS prosody and children’s language outcomes remains an open question. The current study set out to examine correlations between the prosody of IDS and children’s concurrent vocabulary as well as longitudinal vocabulary growth. In particular, we consider the different effects of IDS prosody when mothers introduce familiar vs. unfamiliar words to children.

The role of prosodic input in children’s lexical development

Extensive research suggests that prosody plays an important role in early language acquisition. Prosody is a major aspect of language and serves linguistic functions at both the word level (lexical tone and stress) and phrase level (intonation) in languages around the world. Infants are sensitive to both word and phrasal prosody from birth (e.g., Christophe, Mehler, & Sebastián-Gallés, Reference Christophe, Mehler and Sebastián-Gallés2001; Nazzi, Floccia, & Bertoncini, Reference Nazzi, Floccia and Bertoncini1998), and they may use it to bootstrap lexical and morphosyntactic learning, a process known as “prosodic bootstrapping” (Gervain, Christophe, & Mazuka, Reference Gervain, Christophe, Mazuka, Gussenhoven and Chen2020). The most prominent feature of IDS is its distinctive prosody, including a higher pitch, a larger pitch range, and a slower speaking rate compared to ADS. Such prosodic exaggeration is found in many languages such as American English, German, Dutch, Mandarin Chinese, and Thai (see reviews in Golinkoff, Can, Soderstrom, & Hirsh-Pasek, Reference Golinkoff, Can, Soderstrom and Hirsh-Pasek2015; Soderstrom, Reference Soderstrom2007).

Researchers have proposed three functions of IDS: attracting infants’ attention, conveying positive affect, and facilitating language acquisition (Spinelli, Fasolo, & Mesman, Reference Spinelli, Fasolo and Mesman2017). The attentional and affective functions of IDS are related to its exaggerated prosody (Cooper & Aslin, Reference Cooper and Aslin1994; Trainor, Austin, & Desjardins, Reference Trainor, Austin and Desjardins2000; but see Singh, Morgan, & Best, Reference Singh, Morgan and Best2002). However, whether the prosody of IDS serves specific linguistic functions is still a matter of much debate. In a meta-analysis, Spinelli et al. (Reference Spinelli, Fasolo and Mesman2017) examined the role of IDS prosody in language acquisition during the first two years of life. Their results suggest that prototypical IDS prosody has a much greater effect on attentional and pre-linguistic aspects, such as eliciting vocal responses, than it does on linguistic outcomes. Prototypical IDS prosody has also been shown to facilitate children’s word learning in laboratory settings, including word segmentation, word recognition, and word-to-object mapping (Ma, Golinkoff, Houston, & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011; Mani & Pätzold, Reference Mani and Pätzold2016; Thiessen, Hill, & Saffran, Reference Thiessen, Hill and Saffran2005). This line of research compares children’s word learning performance between ADS and IDS conditions. Children hear auditory stimuli that have similar speech content but are produced with either ADS or prototypical IDS prosody. However, these studies cannot fully account for the role of IDS prosody in children’s lexical development. First, most of these studies tested overall effects of IDS versus ADS, and only one study has investigated which acoustic cues in IDS might support word recognition (Song, Demuth, & Morgan, Reference Song, Demuth and Morgan2010). Their findings suggest that slow speaking rate and vowel hyperarticulation, but not wide pitch range, significantly improved children’s word recognition. Second, conclusions from these online word-learning experiments often rely on group differences instead of examining individual differences in children’s prosodic input.

The quantity and quality of input show great individual variation and there is substantial research investigating the links between quantity and quality of individual mothers’ language input and children’s language outcomes. It is well established that the quantity (e.g., number of words) of language input a child receives in early years is associated with his or her lexical development (e.g., Hart & Risley, Reference Hart and Risley1995; Hoff & Naigles, Reference Hoff and Naigles2002; Ramírez-Esparza, García-Sierra, & Kuhl, Reference Ramírez-Esparza, García-Sierra and Kuhl2014). As for input quality, studies have shown that lexical richness, syntactic complexity, repetitiveness, and vowel hyperarticulation, are related to children’s vocabulary size (Hartman, Bernstein Ratner, & Newman, Reference Hartman, Bernstein Ratner and Newman2017; Hoff & Naigles, Reference Hoff and Naigles2002; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Newman, Rowe, & Bernstein Ratner, Reference Newman, Rowe and Bernstein Ratner2016).

However, even though IDS is distinguished from ADS primarily by its exaggerated prosody, the association between prosodic quality and children’s language outcomes is less studied. In fact, a recent meta-analysis, considering studies as recent as July 2017 on the links between the quantity and quality of linguistic input and children’s language outcomes, did not describe a single study that focused on the role of prosody (Anderson, Graham, Prime, Jenkins, & Madigan, Reference Anderson, Graham, Prime, Jenkins and Madigan2021). So far, only a few studies have examined the links between individual mothers’ IDS prosody and children’s language outcomes such as vocabulary size, and the findings are mixed. There is evidence to show that the percentage of time when parents use prototypical infant-directed speaking style is a significant predictor of children’s concurrent speech production and later vocabulary size (Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014). Furthermore, some specific prosodic cues have been found to be linked to children’s vocabulary size. For example, Porritt, Zinser, Bachorowski, and Kaplan (Reference Porritt, Zinser, Bachorowski and Kaplan2014) found that F0 range in IDS was positively correlated with 3- to 14-month-old infants’ expressive vocabulary percentile scores. Raneri, von Holzen, Newman, and Bernstein Ratner (Reference Raneri, von Holzen, Newman and Bernstein Ratner2020) recently found that a slow speaking rate in IDS at seven months predicts larger expressive vocabulary at two years of age. However, Song, Demuth, and Morgan (Reference Song, Demuth and Morgan2018) did not find any significant correlations between the prosody of individual mothers’ IDS (mean pitch and pitch range) at 17 months and children’s vocabulary size at 19 or 25 months. Kalashnikova and Burnham (Reference Kalashnikova and Burnham2018) investigated whether three components of IDS, including vowel hyperarticulation, pitch, and affect, predicted children’s vocabulary size at later ages. This study took a “hyper-score” measure instead of using raw prosodic values as predictors. They measured vowel triangle areas, mean F0 of vowels, and affect scores (rated by native speakers) in IDS addressed to children at 7, 9, 11, 15, and 19 months of age, as well as in ADS. For each of the three factors, a hyper-score was obtained by dividing each mother’s IDS score by their corresponding ADS score. These hyper-scores indicate the degree of modification in IDS compared to ADS for each participant mother. Their results show that only vowel hyper-scores at 9 months and beyond significantly correlate with children’s expressive vocabulary size at 15 and 19 months, while neither pitch nor affect hyper-scores could predict children’s vocabulary size. The authors concluded that vowel hyperarticulation, but not generally exaggerated pitch or positive affect, plays a role in lexical development.

Taken together, previous studies have yielded inconsistent results regarding whether the prosody of IDS can predict children’s vocabulary size and which prosodic parameters of IDS are correlated with children’s vocabulary size.

IDS prosody in word-learning contexts

In correlational studies on the relationship between IDS prosody and children’s vocabulary size, prosody is often measured at the global level without taking speech context into consideration. Word-learning contexts are defined as situations in which mothers introduce unfamiliar words to children. Such contexts may be assumed to provide the most direct input for children learning novel words and are thus crucial for word learning. Recent studies have found that mothers modify their speech prosody when introducing unfamiliar words, as compared to familiar words, to children (Han, de Jong, & Kager, Reference Han, de Jong and Kager2020, Reference Han, de Jong and Kager2021). These prosodic modifications were cross-linguistically evident, although the specific prosodic cues that were modified varied among the languages investigated. In particular, Dutch mothers of 18- and 24-month-old children had a lower pitch and a slower articulation rate when introducing unfamiliar words compared to familiar words, while Mandarin-Chinese-speaking mothers heightened pitch for 18- month-old children and expanded pitch range for the 24-month-olds. These findings indicate that mothers not only exaggerate their prosody at a global level, they also modify their speech prosody in word-learning contexts. As such, even if the generally exaggerated prosody of IDS may not be reliably associated with children’s vocabulary size, it is nonetheless still possible that the prosody of IDS in word-learning contexts may be related to children’s vocabulary size. In this study, we therefore examine the relationship between the prosody of IDS in word-learning contexts and children’s language outcomes.

The current study

As illustrated above, it is yet unclear whether there is a correlation between IDS prosody and children’s vocabulary size. Also, no study has investigated whether IDS prosody in word-learning contexts predicts children’s vocabulary size. The overarching goal of this study is to determine whether individual mothers’ IDS prosody is associated with their child’s vocabulary size concurrently and longitudinally. Crucially, we take speech context into consideration and examine mothers’ prosody when introducing unfamiliar words to children as a predictor of children’s vocabulary size. As we are specifically interested in the role of IDS in children’s word learning, we opted to test children longitudinally at both 18 and 24 months, during which period children’s vocabulary increases rapidly (Goldfield & Reznick, Reference Goldfield and Reznick1990) and word learning ability improves significantly (Bion, Borovsky, & Fernald, Reference Bion, Borovsky and Fernald2013).

There are two ways to measure individual differences in children’s prosodic input: raw prosodic values (e.g., Raneri et al., Reference Raneri, von Holzen, Newman and Bernstein Ratner2020) and prosodic hyper-scores (Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018). The raw prosodic values measure the prosody of the IDS the children hear. The hyper-scores are calculated by dividing raw IDS prosodic values by ADS values, and indicate the degree of prosodic modification in IDS compared to ADS. We use both raw prosodic values and prosodic hyper-scores as prosodic predictors and calculate these per utterance.

We have two research questions:

First, we ask whether the three prosodic parameters of individual mothers’ IDS – mean pitch, pitch range, and articulation rate – predict children’s concurrent vocabulary size and longitudinal vocabulary growth. Since prototypical IDS prosody has been shown to facilitate children’s online word learning (e.g., Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011; Mani & Pätzold, Reference Mani and Pätzold2016; Thiessen et al., Reference Thiessen, Hill and Saffran2005), we predict a correlation between the raw prosodic values and children’s vocabulary size. Specifically, we predict that a higher mean pitch, a larger pitch range, and a slower speaking rate are associated with children’s larger vocabulary size and vocabulary growth. Also, we predict that the prosodic hyper-scores, which indicate the extent to which mothers modify their IDS compared to ADS, are positively correlated with children’s vocabulary size and vocabulary growth.

Second, as we are interested in the effect of word-learning context, we ask whether the correlations between prosody and children’s language outcomes differ when mothers introduce familiar vs. unfamiliar words to their children. Since word-learning contexts in which mothers introduce unfamiliar words are immediately relevant to children’s novel word learning, we predict that IDS prosody when a mother introduces unfamiliar words to her child will be correlated with children’s vocabulary size and growth and better explain individual differences in children’s vocabulary compared to IDS prosody when introducing familiar words.

Method

Participants

This study is part of a larger cross-linguistic study on Dutch and Mandarin Chinese infant-directed speech (Han, Reference Han2019). The speech data collection methods are identical to those reported in Han et al. (Reference Han, de Jong and Kager2020, Reference Han, de Jong and Kager2021).Footnote ¹ The participants were Dutch-speaking mother-child dyads who were recruited from the Utrecht Baby Lab database and were all Dutch native speakers living in the Utrecht area in the Netherlands. All children were Dutch-learning monolinguals (degree of exposure to a second language < 10%, as measured by the Multilingual Infant Language Questionnaire (Liu & Kager, Reference Liu and Kager2017)). We used a longitudinal design and collected mothers’ ADS and IDS speech data when their children were 18 months and 24 months. Forty-nine mother-child dyads participated when children were 18 months old (mean age of children = 18;15, age range = 18;00–18;29; girls N = 26; mean age of mothers = 35 years, age range = 29–44 years). Thirty-two of these mother-child dyads visited the lab again when the children were 24 months old (mean age of children = 24;18, age range = 24;00–26;30). All children were typically developing with no report of language or hearing problems. All mothers had higher education (HBO (hogescholen ‘universities of applied sciences’) or WO (universiteiten ‘research universities’) and above). All participant mothers signed informed consent forms.

Speech data collection

During each lab visit, the mother-child dyads participated in a semi-spontaneous storybook-telling task. We designed two storybooks for 18- and 24-month-old children, respectively. Each book contained seven preselected target words that were either familiar or unfamiliar to children (see Table 1 for a list of target words). The book structure was the same for the two groups – however, the five unfamiliar words were replaced with new unfamiliar words in the 24-month-old version. On each page of the picture book, a word was on the left side and an illustration including a depiction of the word was shown on the right side. No other script was provided besides the target words (see Han, Reference Han2019, p. 187 for the picture book). An additional six pages were used as fillers to make the story coherent throughout the book. We selected default familiar words on the basis of the Dutch version (N-CDI, Zink & Lejaegere, Reference Zink and Lejaegere2002) of the MacArthur-Bates Communicative Development Inventories (CDI, Fenson, Marchman, Thal, Dale, & Reznick, Reference Fenson, Marchman, Thal, Dale and Reznick2007). In contrast, the default unfamiliar words were not listed in the N-CDI. The familiar words were also more frequent than the unfamiliar words. Due to individual differences in vocabulary, the actual familiarity of the target words might vary among the child participants. Thus, after each experiment, mothers completed a checklist to determine the familiarity of words for each child. The checklist resembled the N-CDI. For each target word, we asked the participant mother to mark whether their child had “understood” (begrijpen) or “understood and said” (begrijpen en zeggen) it before the experiment. These responses were coded as Familiarity (Familiar/Unfamiliar) and used in data analyses.

Table 1. Target words

All participants were tested in a quiet room in the Utrecht Baby Lab. Before the experiment, mothers were given a few minutes to familiarize themselves with the book. The mothers were then instructed to tell the story twice, once to an adult (ADS) and once to their child (IDS). For ADS, mothers were instructed to tell the story to an experimenter (female, a Dutch native speaker), and to take into account the fact that she was a college student. For IDS, the child sat on his or her mother’s lap, and the mother was instructed to tell the story to her child the way she normally would at home. The mothers were specifically told that they could use any sentences; the only requirement was to include the words given on each page. The order of the two speech registers was counterbalanced across participants. Speech data were recorded with a ZOOM H1 recorder with 16-bit resolution and a sampling rate of 44.1 kHz. Each experimental session took about 15–20 minutes. All families received a book as a gift after the experiment.

Prosodic measures

We measured the prosody of utterances containing the target words. We focus on prosody at the utterance level for the following reasons. First, previous studies on the correlations between IDS prosody and children’s language outcomes often measured prosody at the utterance level (e.g., Raneri et al., Reference Raneri, von Holzen, Newman and Bernstein Ratner2020; Song et al., Reference Song, Demuth and Morgan2018; Suttora, Salerni, Zanchi, Zampini, Spinelli, & Fasolo, Reference Suttora, Salerni, Zanchi, Zampini, Spinelli and Fasolo2017). Song et al. (Reference Song, Demuth and Morgan2010) also manipulated articulation rate and pitch range at the utterance level to test the effect of different prosodic cues on infant word recognition. Second, in real-word word learning settings, novel words are often embedded in an utterance. Only a small portion of words are presented in isolation when addressing children (Brent & Siskind, Reference Brent and Siskind2001; Han et al., Reference Han, de Jong and Kager2021).

To measure the utterance prosody, a trained Dutch native speaker annotated and extracted these utterances from the audio recordings using Praat (Boersma & Weenink, Reference Boersma and Weenink2017). An utterance boundary was defined as “any pause longer than 200ms which is preceded by an intonational phrase boundary (pauses not accompanied by an IP boundary were considered utterance internal),” following Martin, Igarashi, Jincho, and Mazuka (Reference Martin, Igarashi, Jincho and Mazuka2016, p. 54). In total, 1927 utterances were elicited, including 1267 utterances for children at 18 months (ADS N = 552, Familiar N = 173; IDS N = 715, Familiar N = 247) and 660 utterances for those at 24 months (ADS N = 286; Familiar N = 96; IDS N = 374, Familiar N = 134).

We extracted the following prosodic measures for the target utterances: articulation rate (syllables/s), mean F0 (in semitones (st)Footnote ²), and F0 range (Maximum F0 – Minimum F0, in semitones (st)). The pitch values were extracted automatically using a Praat script and checked manually for doubling and halving errors. For articulation rate, a Dutch native speaker transcribed and manually counted the numbers of phonological syllables for each of the target utterances. Another coder counted the numbers of syllables for 10% of the recordings. The intercoder reliability was 0.93 (percentage of agreement). All prosodic measurements were averaged by Register (ADS/IDS) and Familiarity (Familiar/Unfamiliar) for each mother.

Vocabulary size

All mothers completed the N-CDI: Woorden en Zinnen (Zink & Lejaegere, Reference Zink and Lejaegere2002) online twice: once when children were 18 months old and once at 24 months. Raw scores of receptive vocabularies were used for data analyses.

Statistical analysis

We conducted a series of multiple regression analyses to examine whether the prosody of mothers’ IDS is correlated with children’s vocabulary size concurrently or longitudinally and which prosodic parameters significantly predict children’s vocabulary. Forty-nine participants were tested at 18 months (girls N = 26), and 32 of the participants were tested again at 24 months (girls N = 19). We consider two types of prosodic predictors: raw prosodic values and prosodic hyper-scores. For each type of prosodic predictor, we performed three sets of multiple regression analyses: concurrent correlations at 18 months, concurrent correlations at 24 months, and longitudinal correlations over this time period.

(1) Concurrent correlations at 18 months. Specifically, we examine whether there were concurrent correlations between the prosodic predictors at 18 months and children’s vocabulary size at 18 months. For this analysis, we include speech data from all 49 participants. Six participants were excluded due to missing vocabulary information, resulting in a total of 43 participants in the final analyses.

(2) Concurrent correlations at 24 months. Here we examine whether there were concurrent correlations between the prosodic predictors at 24 months and children’s vocabulary size at 24 months. For this analysis, we include the 32 participants who participated at both ages, of which 5 participants were excluded due to missing vocabulary information at 24 months, resulting in a total of 27 participants in the final analyses.

(3) Longitudinal correlations between the prosodic predictors at 18 months and children’s vocabulary at 24 months. In particular, we examine whether there were longitudinal correlations between the prosodic predictors at 18 months and children’s vocabulary size at 24 months. For this analysis, we also only include the 32 participants who participated at both ages, of which 5 were excluded due to missing vocabulary information at 24 months, resulting in a total of 27 participants in the final analyses. For this analysis, the effect of individual differences in vocabulary size was accounted for by including children’s vocabulary size at 18 months as a predictor in the model.Footnote ³

The multiple regressions were done in the R environment (R Core Team, 2018) using the lm() function. The outcome variables were children’s receptive vocabulary at either 18 months or 24 months. The predictor variables were raw prosodic values and prosodic hyper-scores. Before building each model, we detected outliers by visual inspection of scatter plots and capped them at the 5th percentile (for outliers below the lower limit) and the 95th percentile (for outliers above the upper limit). For each model, we started by including all the predictors and their interactions with FamiliarityFootnote ⁴ and then used the “stepAIC” function of the MASS package (Venables & Ripley, Reference Venables and Ripley2002) to reduce the model by selecting variables with a significance level of 5% (direction was set to “backward”).

Results

Descriptive statistics

Table 2 shows descriptive statistics of the raw prosodic values of IDS and hyper-scores at 18 months. Table 3 shows descriptive statistics of the raw prosodic values of IDS and hyper-scores at 24 months. Supplementary Figures 1-6 show scatter plot matrices of correlations (Pearson correlation coefficients) between all predictors (raw prosodic values and prosodic hyper-scores) and children’s receptive vocabulary.

Table 2. Means and standard deviations (SDs) of raw prosodic values of IDS and hyper-scores in 18 months (N = 43)

Table 3. Means and standard deviations (SDs) of raw prosodic values of IDS and hyper-scores in 18 and 24 months (who participated longitudinally) (N = 27)

The outcome measure was children’s receptive vocabulary. Children’s vocabulary increased significantly from 18 months (M = 247, SD = 103, range = 101–473) to 24 months (M = 529, SD = 90, range = 352–670).

Correlations between the raw prosodic values and children’s vocabulary size

We first examined whether the raw prosodic values of mothers’ IDS could predict children’s concurrent vocabulary at 18 and 24 months as well as children’s vocabulary growth between these two ages. Regression analyses revealed that, for the 18-month-old group, the final model showed no significant correlation between the raw prosodic values and children’s vocabularyFootnote ⁵ (see Supplementary Figure 1). Similarly, the final model for the concurrent correlations at 24 months was not significantFootnote ⁶ (see Supplementary Figure 2). There was no remaining predictor in the final models.

For the longitudinal correlations between the raw prosodic values of IDS at 18 months and children’s vocabulary growth between 18 and 24 months (see Supplementary Figure 3), the results of the regression analyses (Table 4) showed two significant predictors for children’s vocabulary at 24 months in the final model: utterance mean F0 and children’s vocabulary at 18 months. This model accounted for 73.6% of variance in children’s vocabulary at 24 months (R ² = 0.736, F(5, 48) = 26.71, p < 0.001). Compared to a model with only vocabulary at 18 months as a predictor (Table 5) (R ² = 0.642, F(1, 52) = 93.31, p < 0.001), this model explained 9.35% more of the variance. When excluding the non-significant predictors in the final model, we found that children’s vocabulary at 18 months (β = 0.73, SE = 0.07, t = 10.89, p < 0.001) and mean F0 (β = 8.89, SE = 2.54, t = 3.50, p < 0.001) significantly predicted children’s vocabulary at 24 months (R ² = 0.711, F(2, 51) = 62.9, p < 0.001). Compared to the model with only vocabulary at 18 months as a predictor (Table 5) (R ² = 0.642, F(1, 52) = 93.31, p < 0.001), adding mean F0 improved the model by explaining 6.94% more of the variance. These results suggest that a higher mean F0 at 18 months significantly predicts children’s vocabulary growth between 18 and 24 months.

Table 4. Regression model for longitudinal correlations between raw prosodic values at 18 months and children’s vocabulary growth (N = 43)