USING PROSODY TO PREDICT UPCOMING REFERENTS IN THE L1 AND THE L2: THE ROLE OF RECENT EXPOSURE

Anouschka Foltz

doi:10.1017/S0272263120000509

USING PROSODY TO PREDICT UPCOMING REFERENTS IN THE L1 AND THE L2

THE ROLE OF RECENT EXPOSURE

Published online by Cambridge University Press: 09 November 2020

Anouschka Foltz

Show author details

Anouschka Foltz*: Affiliation:
University of Graz
*: *Correspondence concerning this article should be addressed to Anouschka Foltz, Institute of English Studies, University of Graz, Heinrichstraße 36/II, 8010 Graz, Austria. E-mail: anouschka.foltz@uni-graz.at

Article contents

Abstract
INTRODUCTION
USING PROSODY FOR PREDICTION IN THE L1
PREDICTIVE PROCESSING IN THE L2
THE ROLE OF EXPOSURE IN PREDICTIVE PROCESSING
EXPOSURE AND PREDICTION IN MODELS OF L2 LANGUAGE PROCESSING
THE CURRENT STUDY
EXPERIMENT 1
PROCEDURE
RESULTS
DISCUSSION
EXPERIMENT 2
RESULTS
DISCUSSION
GENERAL DISCUSSION AND CONCLUSIONS
CONCLUSIONS
Footnotes
References

Rights & Permissions

Abstract

While monolingual speakers can use contrastive pitch accents to predict upcoming referents, bilingual speakers do not always use this cue predictively in their L2. The current study examines the role of recent exposure for predictive processing in native German (L1) second language learners of English (L2). In Experiment 1, participants followed instructions to click on two successive objects, for example, Click on the red carrot/duck. Click on the green/GREEN carrot (where CAPS indicate a contrastive L + H* accent). Participants predicted a repeated noun following a L + H* accent in the L1, but not in the L2, where processing was delayed. Experiment 2 shows that after an exposure period with highly consistent prosodic cues, bilinguals engaged in predictive processing in both their L1 and L2. However, inconsistent prosodic cues showed different effects on bilinguals’ L1 and L2 predictive processing. The results are discussed in terms of exposure-based and resource-deficit models of processing.

Type: Research Article
Information: Studies in Second Language Acquisition , Volume 43 , Issue 4 , September 2021 , pp. 753 - 780

DOI: https://doi.org/10.1017/S0272263120000509 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Open Practices: Open data
Copyright: © The Author(s), 2020. Published by Cambridge University Press

INTRODUCTION

Abundant evidence in the literature shows that listeners engage in predictive processing in their native language (Kamide, Reference Kamide2008). This suggests that listeners not only integrate incoming sentence material into the phrase they are currently processing but also use the already available information to make predictions as to which words, syntactic structures, and so forth come next. Native listeners can use various different linguistic cues and real-world information to make these predictions (Boland, Reference Boland2005; Kamide et al., Reference Kamide, Scheepers and Altmann2003; Lau et al., Reference Lau, Stroud, Plesch and Phillips2006; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; Weber et al., Reference Weber, Grice and Crocker2006b) and can predict information at various levels of linguistic representation (Boland, Reference Boland2005; DeLong et al., Reference DeLong, Urbach and Kutas2005; Lau et al., Reference Lau, Stroud, Plesch and Phillips2006; Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005).

There is also abundant evidence that nonnative (L2) speakers do not engage in predictive processing to the same extent as native (L1) speakers (Kaan, Reference Kaan2014). This general finding is reflected in Grüter et al.’s (Reference Grüter, Rohde and Schafer2017) hypothesis that L2 learners have Reduced Ability to Generate Expectations (RAGE). However, it is not yet clear in which situations and why L2 learners do or do not engage in predictive processing. Specifically, L2 learners may differ from native speakers in their use of predictive processing in terms of the cues that they use for prediction and/or the levels of representation that they predict. Individual learner differences may also contribute to whether or not learners engage in predictive processing. The current study focuses on the use of prosody, specifically contrastive pitch accents, as a cue to predict upcoming referents in German–English bilinguals’ L1 and L2. Furthermore, the current study explores the effect of recent exposure to inconsistent versus consistent prosodic cues on bilinguals’ ability to use prosody as a cue to predict upcoming referents.

USING PROSODY FOR PREDICTION IN THE L1

Studies with adult native speakers of various languages suggest that native listeners can use prosodic cues quickly and effectively to predict upcoming referents (e.g., Ito & Speer, Reference Ito and Speer2008; Weber et al., Reference Weber, Braun and Crocker2006a). Most of these studies investigate the role of contrastive pitch accents for prediction during discourse processing. Contrastive pitch accents in both English and German consist of a low target, followed by a steep rise in pitch to a high target on the stressed syllable of the pitch-accented word, typically toward the end of the stressed vowel (L + H* using ToBI labeling, cf. Silverman et al., Reference Silverman, Beckman, Pitrelli, Ostendorf, Wightman, Price and Hirschberg1992). Thus, they are characterized by a large and salient pitch excursion (as well as lengthening).

Native English listeners can use L + H* accents to predict upcoming referents in a discourse context like Hang the blue angel. […] Now, hang the GREEN… (Ito & Speer, Reference Ito and Speer2008). Specifically, listeners started looking at the angel when hearing GREEN, and crucially, before hearing the following noun, thus predicting that the noun angel would be repeated. This led to an anticipatory effect if angel did indeed follow GREEN, but caused a prosodic garden-path effect if a different noun, such as drum, followed GREEN. Such predictive processing did not occur if the instruction contained no contrastive pitch accent, as in Hang the blue angel. […] Now, hang the green… (Ito & Speer, Reference Ito and Speer2008, Experiment 2).

Similarly, native German listeners looked at a picture of red scissors earlier when hearing German instructions to Click on the purple scissors followed by Click on the RED… compared to Click on the red… (Weber et al., Reference Weber, Braun and Crocker2006a). Again, facilitation occurred if the previous noun was repeated, and a prosodic garden-path effect occurred if the previous noun was not repeated. Furthermore, there was a smaller garden-path effect for sequences like Click on the purple scissors. Click on the red vase. This suggests that participants expected a repeated noun in successive instructions regardless of prosody, but that a contrastive accent on the adjective of the second instruction strengthened this expectation.

Overall, native English and German listeners can use contrastive pitch accents to predict upcoming referents. Moreover, prosodic information is used rapidly for prediction during processing as the relevant prosodic cue was produced on the word immediately preceding the predicted noun, and prediction thus occurred as soon as listeners heard the cue.

PREDICTIVE PROCESSING IN THE L2

Predictive processing in the L2 differs markedly from the L1 in that L2 learners engage in predictive processing in fewer processing situations than native speakers (e.g., Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp, Reference Hopp2013), and even when their knowledge of the words and syntactic structures involved in the processing is comparable to that of native speakers (e.g., Grüter et al., Reference Grüter, Lew-Williams and Fernald2012, Reference Grüter, Rohde and Schafer2017; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010). Various factors influence whether or not L2 learners engage in predictive processing. For example, more proficient L2 learners can show nativelike predictive processing (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp, Reference Hopp2013), and there is evidence that L2 learners can engage in more nativelike processing when their L1 is similar to their L2 (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Foucart & Frenck-Mestre, Reference Foucart and Frenck-Mestre2011; Sabourin & Stowe, Reference Sabourin and Stowe2008).

The few studies that have investigated whether and how L2 learners use contrastive pitch accents for predictive processing suggest that L2 learners can use these cues for prediction if they can use knowledge and processing routines from their L1 in their L2. For example, native Spanish L2 learners of English only used contrastive pitch accents predictively in English if equivalent prosodic structures occurred in Spanish (Klassen, Reference Klassen2015). Spanish assigns prominence to the rightmost element in a prosodic phrase and this prominence can shift leftward only in the case of a correction, but not a contrast (Klassen, Reference Klassen2015). Consequently, to indicate a contrast, a prosodic structure like Move pumpkin number THREE is possible in Spanish, but a prosodic structure like Move PUMPKIN number three is not, whereas both are possible in English. In line with this, native Spanish and native English speakers showed anticipatory eye movements to a picture of pumpkin number two for instructions like Move pumpkin number THREE to pumpkin number TWO, which both English and Spanish allow. In contrast, only native English, but not native Spanish, speakers showed anticipatory eye movements to a picture of rocket number three for instructions like Move PUMPKIN number three to ROCKET number three, which Spanish does not allow. Similarly, native French speakers, but not native English L2 learners of French, used contrastive prosody to predict upcoming referents in French (Namjoshi, Reference Namjoshi2015), and native English speakers, but not native Japanese and Chinese L2 learners of English, used contrastive prosody predictively in English (Perdomo & Kaan, Reference Perdomo and Kaan2019; Takeda, Reference Takeda2018). Here, learners could not easily use knowledge and processing routines from their L1 in their L2. Specifically, the intonational systems of French, Japanese, and Chinese differ substantially from that of English. In French, the pitch accent that most typically conveys a contrast is not a L + H* accent, but a high tone (H) on the initial syllable of the contrasted word, which also carries a H* accent on its final syllable (Di Cristo, Reference Di Cristo, Hirst and Di Cristo1998). Japanese and Chinese prosodically mark contrastive information not through pitch accents, but through local pitch range expansion (Greif, Reference Greif2010; Venditti et al., Reference Venditti, Maekawa, Beckman, Miyagawa and Saito2008).

Overall, the findings so far suggest that L2 learners may only be able to use contrastive pitch accents as a cue for prediction if their native language is sufficiently similar to their L2 in terms of contrastive accentuation.

THE ROLE OF EXPOSURE IN PREDICTIVE PROCESSING

Exposure can influence bilinguals’ processing routines, and this effect can be independent of proficiency (Dussias & Sagarra, Reference Dussias and Sagarra2007). Exposure may be especially important for cues like contrastive pitch accents because they are optional, unlike some other cues, such as grammatical gender assignment (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013). For example, English adjectives that appear in the context of a lexical contrast receive a contrastive pitch accent that marks this contrast only around 50% of the time. This number goes down even further for lexically contrasted nouns, which receive a contrastive pitch accent only about 20% of the time (Ito & Speer, Reference Ito, Speer, Sudhoff, Lenertova, Meyer, Augurzky, Pappert, Mleinek, Richter and Schließer2006).

Recent exposure does indeed influence how native listeners interpret contrastive pitch accents (Kurumada et al., Reference Kurumada, Brown, Bibyk, Pontillo and Tanenhaus2014, Reference Kurumada, Brown and Tanenhaus2012). Specifically, previous exposure to a reliable or unreliable speaker, that is, a speaker whose use of contrastive pitch accents did or did not provide reliable information about referents, influences how native English listeners interpret statements such as It looks like a zebra compared to It LOOKS like a zebra (Kurumada et al., Reference Kurumada, Brown, Bibyk, Pontillo and Tanenhaus2014). Participants exposed to the reliable speaker looked at the picture of an okapi (which has striped legs and looks quite similar to a zebra) reliably more often than at the picture of a zebra when hearing LOOKS, but not when hearing looks, suggesting that they used the L + H* accent as a cue that the speaker was not referring to a zebra. No such effect was found for participants exposed to the unreliable speaker, suggesting that native listeners consider the prior reliability of the prosodic cue when making predictions during language processing.

EXPOSURE AND PREDICTION IN MODELS OF L2 LANGUAGE PROCESSING

Exposure plays a major role in several models of language processing that allow for predictive processing and have been applied to L2 processing. Such models include constraint-based models (Dussias & Cramer Scaltz, Reference Dussias and Cramer Scaltz2008; MacDonald et al., Reference MacDonald, Pearlmutter and Seidenberg1994), tuning models (Cuetos et al., Reference Cuetos, Mitchell, Corley, Carreiras, García-Albea and Sebastián-Gallés1996; Dussias & Sagarra, Reference Dussias and Sagarra2007), and implicit learning models (e.g., Chang et al., Reference Chang, Janciauskas and Fitz2012). Constraint-based models assume that listeners use all the available information immediately during processing and that several alternatives may be activated in parallel. The processor selects an alternative by weighing different constraints, whose strength is determined probabilistically through effects of frequency, plausibility, and so forth (Altmann, Reference Altmann1998). Many of the proposed constraints, such as global syntactic biases, biases of individual words, and so forth, are based on input frequency and are thus directly related to exposure. Similarly, tuning models and implicit learning models assume that processing is experience based, and that listeners would keep track of, for example, whether or not lexical contrasts are prosodically marked with a contrastive pitch accent and would adjust their predictions accordingly (Cuetos et al., Reference Cuetos, Mitchell, Corley, Carreiras, García-Albea and Sebastián-Gallés1996). Thus, frequency information derived from exposure plays a major role in these models.

Importantly, all these models also explicitly incorporate predictive processing. Predictions as to what comes next are modeled in terms of weightings or adjustments that are based on frequency and other information derived from the input. What exactly is predicted depends on the weightings for the individual options, which in turn is influenced by how often each option is encountered in the input and/or by how often each option encountered in the input matches the prediction. Because exposure plays a major role in these models, similar exposure patterns should lead to similar processing.

Some models of L2 language processing do not directly incorporate exposure-based effects and predictive processing into their mechanism, but are compatible with such effects. For example, models that focus on resource deficits, such as computational difficulties in L2 processing (Hopp, Reference Hopp2009; McDonald, Reference McDonald2006), are compatible with both predictive processing and exposure-based effects. These models assume that differences in L1 and L2 processing are mainly due to resource deficits and would predict that L2 learners can generally engage in predictive processing, but may not be able to do so with increasing task complexity, slower lexical access, or less automatic processing routines.

THE CURRENT STUDY

The current study focuses on the role of recent exposure for using contrastive pitch accents to predict upcoming referents in bilinguals’ L1 and L2. The specific focus of the current study is on comparing predictive processing across the bilinguals’ two languages. That is, the focus is on how participants respond in their L1 versus their L2 when faced with the same changing processing situation.

Experiment 1 compares predictive processing in the L1 and the L2 in intermediate to advanced German–English bilinguals. Based on the previous literature, German–English bilinguals should be able to use their L1 knowledge in their L2 and engage in predictive processing both in the L1 and the L2. However, if participants are generally slower in processing their L2, which would be most compatible with a resource-deficit account, participants may only engage in predictive processing in their L1, but not their L2. During Experiment 1, the speaker that participants encounter uses prosodic cues inconsistently, such that predictive processing should decrease over the course of the experiment (Kurumada et al., Reference Kurumada, Brown and Tanenhaus2012, Reference Kurumada, Brown, Bibyk, Pontillo and Tanenhaus2014).

Experiment 2 is preceded by an exposure phase in which participants experience the same speaker consistently using contrastive pitch accents as a cue to upcoming referents. This consistent exposure should facilitate predictive processing in the following experimental trials in both the L1 and the L2. During the experimental trials, participants again encounter the speaker using prosodic cues inconsistently, such that predictive processing should again decrease over the course of the experiment, though possibly more slowly than in Experiment 1.

EXPERIMENT 1

METHODS

Participants

Seventeen native-German intermediate-to-advanced (B2 or above using CEFR levels; Council of Europe, 2001) learners of English (4 male, 13 female; mean age 24.5, SD = 5.2) participated in the study. An additional participant was excluded due to more than 20% of track loss. Including this participant in the analysis did not change any of the results. Participants self-rated their English proficiency on a scale with 1 being beginner, 2 being good at English, 3 being very good at English, 4 being fluent, and 5 being native. Participants’ average ratings for their reading and writing abilities were 3.3 (SD = 0.8) and 2.7 (SD = 0.7), respectively. Comprehension and speaking abilities were rated as an average of 3.0 (SD = 0.7) and 2.6 (SD = 0.9), respectively. Participants had been learning English for an average of 10.9 years (SD = 2.9) at the time of the study.

Materials

The materials for this study comprised line drawings of different objects and recorded instructions to click on these objects. Twenty-four line drawings, which were either under a creative commons license or freely available online, were grouped into sets of four (see Appendix A). In any given trial, six objects from one set were displayed on the computer screen (see Figure 1). The German names for the objects in each set had the same grammatical gender, so that listeners could not identify an object based on hearing the gender-marked definite article that preceded each mention of the object in German (Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010). Each line drawing was colored in four different colors (blue, green, red, and yellow) using GIMP (The GIMP team, 2014), for a total of 16 objects (four objects in four different colors) in each set.

FIGURE 1. Sample experimental display. Objects pictured were adapted from materials by Saskia, Gast, and Janina Valko and are available at madoo.net under a (cc) Creative Commons by-sa license.

Instructions to click on the objects were recorded for all objects in all four colors in both German and English by a balanced German–English bilingual with phonetic training. Instructions were of the form Click on the [color] [object name], for example, Click on the green banana or Klick die grüne Banane an (literally: Click the green banana on). All instructions were recorded with three prosodic patterns on the adjective and noun: A L + H* accent on the adjective and no pitch accent on the noun (LHA prosody), no pitch accent on the adjective and a L + H* accent on the noun (LHN prosody), or no L + H* accent on the adjective or noun. The latter most typically resulted in a H* accent on the adjective and a !H* on the noun (HH prosody).

Tables 1 and 2 show the means for duration, f0 minimum, f0 maximum, and f0 range (extracted using Praat; Boersma & Weenink, Reference Boersma and Weenink2017) for the productions in the three prosodic conditions for German and English, respectively. Because this article concerns predictive processing in response to the prosody of the adjective, the table shows these summary measures for the adjective only. F0 minima and maxima were manually checked for pitch halving, doubling, and segmental effects, and were hand corrected if needed. F0 range was calculated by subtracting f0 minimum from f0 maximum. For each of the four measures, Tables 1 and 2 also show the results of a one-way ANOVA analysis and a Tukey’s multiple comparisons of means post-hoc test comparing the three prosodic conditions. For the LHA condition, where the adjective carries a L + H* accent, the tables also show the percentage of the adjective that has elapsed when the peak of the contrastive accent is encountered. All the summary measures show significant differences across the three prosodic conditions in both languages, such that the adjective in the LHA condition is, as expected, significantly longer in duration, has a significantly higher f0 maximum, and has a significantly higher f0 range than both the LHN and HH conditions.

TABLE 1. Mean duration, f0 minimum, and f0 maximum in the three prosodic conditions (LHA, LHN, and HH) for the adjective in German. Peak location as a percentage of adjective duration is also given for the LHA condition

TABLE 2. Mean duration, f0 minimum, and f0 maximum in the three prosodic conditions (LHA, LHN, and HH) for the adjective in English. Peak location as a percentage of adjective duration is also given for the LHA condition

An additional two-sample t-test shows that the German target adjectives are significantly longer in duration than the English target adjectives (t = −15.82, df = 344.26, p < 0.001), giving participants more time to process the German compared to the English target adjectives. The possible implications of this will be addressed in the discussion section for Experiment 1.

For each trial, six pictures from one picture set were combined with two recorded instructions to click on two of the displayed objects. For example, the display in Figure 1 was combined with an instruction to click on the red duck and a following instruction to click on the green banana. The two successive instructions either had a repeated noun such that they differed only in the color adjective (Color Contrast Condition), a repeated adjective such that they differed only in object type (Object Contrast Condition), or neither a repeated adjective nor a repeated noun (No Contrast Condition). The first instruction was always produced with rather neutral HH prosody. The second instruction was produced either with LHA prosody (LHA Condition), LHN prosody (LHN Condition), or again with HH prosody (HH Condition). The contrast and prosody conditions were combined to yield eight experimental and filler conditions, listed in Table 3.

TABLE 3. Experimental prosodic and contrast conditions

The displays that participants saw in the experimental conditions always showed the object mentioned in the first instruction (e.g., a red duck) and two objects in a different color, one of which would be mentioned in the second instruction. Of these, one object had the same object type (e.g., a green duck; Color Contrast condition) and the other a different object type (e.g., a green banana; No Contrast condition) than the first-mentioned object. Moreover, each display contained three filler pictures (e.g., a yellow banana, a yellow carrot, and blue pants, cf. Figure 1). The displays for the filler trials always showed the two objects mentioned in the two instructions (e.g., a red duck and a green banana), each of the two object types in a different color (e.g., a green duck and a yellow banana), and two filler pictures (e.g., a yellow carrot and blue pants).

The order of trials for the German and English versions of the experiment was identical. Picture sets and contrast conditions were distributed across the experiment in a Latin square design. The location of objects from the same picture set differed across trials, so that across the experiment no display was identical. Prosody conditions were distributed so that every other trial had HH prosody, with LHA and LHN prosody distributed pseudorandomly across the experiment.

PROCEDURE

Participants came to the lab on two different days, approximately 1 week apart. The procedure on both days was the same. Half the participants participated in the German version of the experiment in the first session and the English version in the second session, and vice versa for the other half of the participants. After giving informed consent, participants were first familiarized with all the objects and object names through black and white line drawings of the objects on printed cards. The experimenter asked participants to name each object and corrected the object name when needed.

Participants then engaged in two production tasks (first and fourth tasks) and two eye-tracking tasks (second and third tasks), with a short break after each task. The current study focuses on the eye-tracking results, so detailed information about the production tasks will not be reported here. Briefly, participants saw the same kinds of displays and objects as in the eye-tracking tasks (see Figure 1) and produced verbal instructions to click on two successive objects marked as 1 and 2 in the display using the sentence frame Click on the [color] [object name]. Thus, the first production task further familiarized participants with the display layout and objects shown in the eye-tracking tasks.

Participants’ eye movements were recorded using a Tobii Pro X2-60 remote eye tracker, attached to a Dell 25-inch monitor. Participants were seated at a comfortable distance from the screen and calibrated using a nine-point calibration procedure. Participants were informed that during each trial they would see six different objects on the computer screen and listen to two successive instructions to click on two of the objects. Their task was to follow the instructions and click on the mentioned objects with the computer mouse. Each trial was preceded by 250 ms of blank screen, the first instruction began 200 ms after the onset of the visual display, and the second instruction began 200 ms after participants had clicked on the first object.

For Experiment 1 (second task), participants completed six trials in all the conditions listed in Table 3, for a total of 48 trials. Importantly, this means that the speaker was inconsistent, such that prosody was not informative with respect to contrast condition. For example, Click on the red duck. Click on the GREEN… was equally frequently followed by duck and banana. Participants could thus not develop expectations as to whether or not the noun would be repeated solely based on the prosodic pattern on the adjective.

After a small break, participants completed Experiment 2 (third task), which will be described in more detail in the following text. Afterward, participants completed another production task (fourth task), followed by a language background questionnaire and the opportunity to participate in a gift card drawing.

DATA ANALYSIS

Participants’ proportion of looks over time to the target object (numerator), that is, the object mentioned in the second instruction, relative to looks to all six objects shown on the screen (denominator) will be used to measure whether they used contrastive pitch accents to predict upcoming referents in the L1 and the L2. Track loss affected 8.9% of data points, distributed relatively evenly across conditions (difference within 3%). Statistical power is similar to Weber et al. (Reference Weber, Braun and Crocker2006a), which most closely resembles the current study, with 102 trials per condition (17 participants × 6 trials per condition) compared to Weber et al.’s 96 (24 participants × 4 trials per condition).

If participants use L + H* accents predictively, they should expect a repeated noun upon encountering a L + H*-accented adjective, but not upon encountering an adjective with no L + H* accent. Thus, for the LHA conditions, the curves that show proportion of looks over time should rise earlier in the Color Contrast condition compared to the No Contrast condition. But for the HH conditions, the proportion of looks over time curves should rise at a similar time in the Color Contrast and No Contrast conditions.

Of particular interest for the current study is at which points in time participants look significantly more at the target object in the Color Contrast condition compared to the No Contrast condition. This will be investigated using Smoothing Spline ANOVA analyses (SSANOVA; cf. Gu, Reference Gu2013), a statistical analysis used to compare curves. SSANOVAs allow for a holistic comparison of curves and can tell us when over time a particular condition yields significantly more looks to the target object over other conditions (Davidson, Reference Davidson2006; Gu, Reference Gu2013). To do this, SSANOVAs fit smoothing splines to the curves of the experimental conditions being compared. The original eye-tracking data is typically rather noisy and produces jagged curves. Smoothing splines determine which smoothed curves best fit the data by balancing goodness-of-fit and smoothness of the original curve. Bayesian confidence intervals can then tell us which sections of the curves diverge statistically significantly. Specifically, we find statistically significant differences between two curves where their confidence intervals do not overlap (Chanethom, Reference Chanethom2011; Koops, Reference Koops2010). This, in turn, tells us when over time participants have reliably more looks to the target object in one condition compared to another. Thus, significance can be determined visually from the plotted curves and confidence intervals. Data and analysis scripts are available on the Open Science Framework at https://osf.io/vuwq9

RESULTS

L1 PROCESSING

Figure 2a shows participants’ proportion of looks over time to the target object in the target conditions in the L1. All results figures are aligned such that 0 ms on the x-axis, visually represented by a vertical solid line, represents the end of the adjective, which coincides with the beginning of the noun. Because it takes about 150 ms–200 ms to plan and execute an eye movement (Fischer, Reference Fischer and Wright1998), an additional, vertical dashed line at 150 ms shows the earliest point in time at which we would expect eye movements to the target object in response to hearing the beginning of the noun. Thus, all eye movements to the target object that occurred before 150 ms were clearly in response to hearing the adjective of the instruction, and before disambiguating segmental information from the noun had arrived.

FIGURE 2. (a) Proportion of looks over time to the target object across target conditions in German (L1). SSANOVA results comparing looks to the target object for (b) HH and (c) LHA conditions.

Figure 2a shows that looks to the target object started rising earliest for a L + H* accent on the adjective followed by a repeated noun and last for a L + H* accent on the adjective followed by a different noun. This pattern mirrors the results from previous studies with native German speakers.

A SSANOVA investigated when in time the four curves diverge significantly. The SSANOVA was conducted in R using the gss package (Gu, Reference Gu2014) and the gssanova() function, which fits SSANOVA models for non-Gaussian responses. Because responses from the current eye-tracking experiment are binomial (0 = not looking at the target; 1 = looking at the target), a binomial error distribution was selected (family = “binomial”). The statistical model’s response variable was looks to the target (0 vs. 1), and the fixed factors were time (from −500 to 1,000), contrast condition (Color Contrast vs. No Contrast), prosody condition (LHA vs. HH), and all their interactions. The model also included random intercepts for participant and item. All SSANOVA graphs in this article show curves derived from analyses containing all fixed factors and interactions. However, the HH and LHA conditions are plotted separately here and in all the following SSANOVA graphs because graphs showing curves and confidence intervals for all four conditions would be rather cluttered and difficult to read. Thus, Figure 2b shows the smoothed curves and the 95% Bayesian confidence intervals for the HH conditions, whereas Figure 2c shows them for the LHA conditions.

Figure 2b shows that the modeled looks to the target object start rising earlier in the Color Contrast HH Condition compared to the No Contrast HH Condition, but at no point in time do the two curves diverge sufficiently for this difference to reach significance. Thus, there is no evidence that participants predict a repeated noun if the adjective does not receive a L + H* accent. Figure 2c shows that the modeled looks to the target object start rising earlier in the Color Contrast LHA Condition compared to the No Contrast LHA Condition, and that the curves diverge significantly from around 0 ms to about 650 ms. Thus, participants start looking at the target object in the LHA conditions earlier when the noun is repeated compared to when it is not. Importantly, this difference is significant in a window that starts before 150 ms, that is, during the processing of the color adjective. This suggests that participants are predicting a repeated noun rather than a different noun if the adjective received a L + H* accent.

To explore whether exposure to an inconsistent speaker over the course of the experiment influences predictive processing in the L1, Figure 3 shows the SSANOVA results for the LHA conditions separately for the first and second halves of the experiment. Figure 3a shows that during the first half of the experiment, participants engage in predictive processing, with the Color Contrast and No Contrast curves diverging from about −50 ms to 750 ms. Figure 3b shows results from the second half of the experiment and suggests that, as participants are exposed to the inconsistent speaker over the course of the experiment, their processing is no longer predictive. That is, the Color Contrast and No Contrast curves diverge reliably only from 200 ms to 450 ms, that is, when segmental information from the noun has started arriving.

FIGURE 3. SSANOVA results for German (L1) comparing looks to the target object for LHA conditions in the (a) first and (b) second half of the experiment.

L2 PROCESSING

Figure 4a shows participants’ proportion of looks over time to the target object for the experimental conditions in the L2. The figure shows the same overall pattern as Figure 2a for the L1 data. However, the curves cluster much closer together than in the L1 data. A SSANOVA identical to the one described in the preceding text investigated when in time these four curves diverge significantly. Figures 4b (HH conditions) and 4c (LHA conditions) show the smoothed curves and the 95% Bayesian confidence intervals.

FIGURE 4. (a) Proportion of looks over time to the target object across target conditions in English (L2). SSANOVA results comparing looks to the target object for (b) HH and (c) LHA conditions.

Figure 4b shows that the modeled looks to the target object start rising earlier in the Color Contrast HH Condition compared to the No Contrast HH Condition. Similar to the L1 data mentioned previously, at no point in time do the two curves diverge statistically significantly. Figure 4c shows that the modeled looks to the target object again start rising earlier in the Color Contrast LHA Condition compared to the No Contrast LHA Condition, and the curves diverge significantly from around 275 ms to 600 ms. Thus, participants started looking at the target object in the LHA conditions earlier when the noun is repeated compared to when it is not. However, the window starts after 150 ms, that is, after the color adjective had been fully processed and participants have started hearing the beginning of the target noun. This suggests that participants show facilitative processing for a repeated noun compared to a different noun if the adjective received a L + H* accent, but there is no evidence that this facilitative processing is predictive.

To explore whether exposure to an inconsistent speaker over the course of the experiment influences predictive processing in the L2, Figure 5 shows the SSANOVA results for the LHA conditions separately for the first and second halves of the experiment. Figure 5a shows the that Color Contrast and No Contrast curves do not diverge during the first half of the experiment, suggesting participants initially do not engage in predictive processing. Figure 5b shows results from the second half of the experiment and suggests that, as participants are exposed to the inconsistent speaker over the course of the experiment, their processing starts showing evidence for being predictive. Specifically, the Color Contrast and No Contrast curves diverge reliably from about 125 ms to 525 ms, that is, the curves start diverging just slightly before segmental information from the noun has started arriving.

FIGURE 5. SSANOVA results for English (L2) comparing looks to the target object for LHA conditions in the (a) first and (b) second half of the experiment.

DISCUSSION

Experiment 1 investigated whether German–English bilinguals used contrastive pitch accents predictively in both their L1 and L2, and whether their processing changed over the course of the experiment. Participants showed an overall advantage in processing repeated nouns following a L + H* accent in both their L1 and L2. This suggests that participants could clearly use the prosodic cue in both languages. However, while this advantage was clearly a result of predictive processing in the L1, with the advantage occurring before 150 ms, this was not the case for the L2. Here, the advantage occurred after segmental information from the beginning of the noun had come in. A close look at the stimuli suggests that this may be due to the time that participants had to process L + H*-accented adjectives in their L1 German versus their L2 English. The bisyllabic German L + H*-accented adjectives were significantly longer compared to the mostly monosyllabic English L + H*-accented adjectives. Furthermore, the peak of the L + H* occurred on average 85 ms earlier in the German L + H*-accented adjectives compared to the English L + H*-accented adjectives (187 ms compared to 102 ms before the beginning of the disambiguating noun; two-sample t-test: t = −14.10, df = 166.53, p < 0.001). Thus, participants had about 85 ms more time to predict upcoming referents based on the prosodic cues in their L1 German compared to their L2 English. However, even when adjusting for these timing differences in the stimuli, participants in Experiment 1 would still exhibit nonpredictive processing in their L2. Specifically, if the L2 curves were adjusted by 85 ms, they would diverge at around 190 ms, that is, once segmental information from the noun has started arriving. Furthermore, monolingual native English speakers show evidence of clear predictive processing in their L1 English with comparable stimuli and in a somewhat more complex discourse situation (Ito & Speer, Reference Ito and Speer2008). What remains is evidence for predictive processing in the L1 and no such evidence for the L2.

This pattern of results supports resource-deficit accounts over exposure-based accounts for several reasons. Resource-deficit accounts assume no fundamental differences between L1 and L2 processing, but attribute observed differences to resource limitations. Participants’ L1 processing seems to differ from their L2 processing not fundamentally, but in terms of processing speed. Specifically, participants use prosodic cues in both their L1 and L2 in a similar manner, but more slowly in the L2, such that we find evidence for predictive processing in their L1, but not in their L2. These differences could be due to slower lexical access in the L2 or less automatic processing routines. Notice that participants’ L1 and L2 processing differs even though their two languages use the same prosodic cue to mark a lexical contrast, which would allow participants to use knowledge and processing routines from their L1 in the L2.

The evidence from the first and second halves of the experiment suggest that participants adjusted their predictions in the direction expected by exposure-based accounts in the L1, but not the L2. Exposure-based accounts would predict that participants should engage in less predictive processing over the course of the experiment because the speaker’s use of prosodic cues was inconsistent. Moreover, listeners were exposed to trials in which they experienced a prosodic garden-path effect, namely, when hearing a sequence like Click on the red duck. Click on the GREEN banana. Such a pattern is not only infelicitous, it also almost never occurs in natural production data (Ito & Speer, Reference Ito, Speer, Sudhoff, Lenertova, Meyer, Augurzky, Pappert, Mleinek, Richter and Schließer2006). From a constraint-based perspective, these garden-path trials should lead to a large prediction error, that is, a large difference between what participants predicted and what is then encountered. Such prediction errors should lead to an adjustment of the predictions, such that participants should be less and less likely to use contrastive pitch accents to predict a repeated noun over the course of experimental trials. This happened in the L1, but not the L2, where participants engaged in more rather than less predictive processing over the course of the experiment.

There are several possible explanations for the unexpected L2 findings, and further studies are clearly needed to determine which mechanisms underlie the L2 patterns found here. One possibility is that participants did not show predictive processing in the initial half of the experiment because of resource deficits. Specifically, participants may initially have been substantially slower in their processing, especially in terms of processing routines and/or lexical access. Such slower processing would explain why there is no evidence for predictive L2 processing in the first half of the experiment. As participants get used to the task and the lexical items used, their lexical access and processing may speed up, yielding the reliable predictive processing in the second half of the experiment. Such an explanation would also entail that participants were less sensitive in the L2 than in the L1 to the speaker’s prosodic cues being inconsistent. Otherwise, their L2 predictive processing should nevertheless have decreased over the course of the experiment, not increased. It is possible that participants’ processing resources in the L2 are taken up by processes such as lexical access and developing processing routines, so that fewer resources are available to track the consistency of the speaker’s prosodic cues.