Introduction
People often predict upcoming events. Based on the notion that prediction is a fundamental aspect of human processing (e.g., Bar, Reference Bar2003; Clark, Reference Clark2013; Friston, Reference Friston2010), there has been a surge of interest in predictive processing in the field of psycholinguistics. A wealth of research suggests that prediction is an important characteristic of language processing but the exact nature of the processes and representations involved remains hotly debated (for extensive discussion, see Hale, Reference Hale2001; Federmeier, Reference Federmeier2007; Levy, Reference Levy2008; Altmann & Mirković, Reference Altmann and Mirković2009; Hickok, Reference Hickok2012; Van Petten & Luka, Reference Van Petten and Luka2012; Falandays, Nguyen & Spivey (Reference Falandays, Nguyen and Spivey2021); Gibson, Bergen, & Piantadosi, Reference Gibson, Bergen and Piantadosi2013; Pickering & Garrod, Reference Pickering and Garrod2013; Dell & Chang, Reference Dell and Chang2014; Huettig, Reference Huettig2015; Kuperberg & Jaeger, Reference Kuperberg and Jaeger2016; Norris, McQueen, & Cutler, Reference Norris, McQueen and Cutler2016; Ferreira & Chantavarin, Reference Ferreira and Chantavarin2018; Pickering & Gambi, Reference Pickering and Gambi2018; Ferreira & Qiu, Reference Ferreira and Qiu2021; Onnis & Huettig, Reference Onnis and Huettig2021; Huettig, Audring, & Jackendoff, Reference Huettig, Audring and Jackendoffin press). One important open question in this regard is to what extent prediction in language processing is modulated by i) individual differences, and ii) challenging situations (Huettig & Mani, Reference Huettig and Mani2016).
Prediction in language processing in mature language users has previously been shown to be influenced by individual differences in working memory capacity and processing speed (Huettig & Janse, Reference Huettig and Janse2016), age (Federmeier, Kutas, & Schul, Reference Federmeier, Kutas and Schul2010; Huang, Meyer, & Federmeier, Reference Huang, Meyer and Federmeier2012; Wlotko & Federmeier, Reference Wlotko and Federmeier2012), literacy (Brouwer, Mitterer, & Huettig, Reference Brouwer, Mitterer and Huettig2013; Huettig & Pickering, Reference Huettig and Pickering2019; Favier, Meyer, & Huettig, Reference Favier, Meyer and Huettig2021; Mishra, Singh, Pandey, & Huettig, Reference Mishra, Singh, Pandey and Huettig2012), and second language proficiency when predicting in L2 (Dussias, Valdés Kroff, Guzzardo Tamargo, & Gerfen, Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp, Reference Hopp2013; but cf. Hopp, Reference Hopp2015; Ito, Pickering, & Corley, Reference Ito, Pickering and Corley2018; Sagarra & Casillas, Reference Sagarra and Casillas2018). We believe that it is fair to say that, despite this considerable number of studies, it is still somewhat unclear how pervasive prediction is across various challenging situations (but see Brouwer et al., Reference Brouwer, Mitterer and Huettig2013; Huettig & Guerra, Reference Huettig and Guerra2019).
In the present study we tested the limits of prediction by asking (native) Dutch, L2 speakers of English, to translate Dutch sentences into their English counterparts during consecutive interpreting (CI) and simultaneous interpreting (SI) tasks. Given that interpreting is an extremely demanding task, especially for untrained bilinguals, we chose to test their tendency to predict language in a strongly supportive task environment: verb-based semantic prediction in a visual world context.
Prediction in challenging situations
Some research suggests that, despite the advantage of making language processing more efficient, predictive processing is far from effortless. Ito, Corley, and Pickering (Reference Ito, Corley and Pickering2018), for example, found that taxing participants’ working memory delayed predictive eye-movements. This finding, together with the findings in Huettig and Janse (Reference Huettig and Janse2016) and Chun, Chen, Liu, and Chan (Reference Chun, Chen, Liu, Chan, Kaan and Grüter2021), suggests that prediction, at least in challenging situations, is constrained by available cognitive resources. In addition, prior empirical efforts also show that limited processing time causes reduced prediction in both listening (Huettig & Guerra, Reference Huettig and Guerra2019) and reading (Ito, Corley, Pickering, Martin, & Nieuwland, Reference Ito, Corley, Pickering, Martin and Nieuwland2016). Challenging situations impeding predictive processing also relate to some perceptual difficulties in adverse conditions, such as casual speech with many phonological reductions (Brouwer et al., Reference Brouwer, Mitterer and Huettig2013) and foreign-accented speech with unreliable and potentially ambiguous input (Porretta, Buchanan, & Järvikivi, Reference Porretta, Buchanan and Järvikivi2020; Romero-Rivas, Martin, & Costa, Reference Romero-Rivas, Martin and Costa2016; Schiller et al., Reference Schiller, Boutonnet, De Heer Kloots, Meelen, Ruijgrok and Cheng2020).
Motivated by ‘prediction-is-production’ views, which posit a fundamental role of the production system for prediction in language processing (Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013), several studies explored prediction by accompanying a comprehension task with a production task. Most of these studies obtained some evidence consistent with a role for the production system (Hintz, Meyer, & Huettig, Reference Hintz, Meyer and Huettig2016; Lelonkiewicz, Rabagliati, & Pickering, Reference Lelonkiewicz, Rabagliati and Pickering2021; Rommers, Dell, & Benjamin, Reference Rommers, Dell and Benjamin2020) though, arguably, overall the evidence is still limited.
Finally, compared to native language processing, L2 settings appear to impose extra challenges on predictive processing. While some studies provided evidence for the occurrence of prediction in L2 to a similar extent as in L1 (e.g., Chambers & Cooke, Reference Chambers and Cooke2009; Dijkgraaf, Hartsuiker, & Duyck, Reference Dijkgraaf, Hartsuiker and Duyck2017), ample work showed smaller, delayed, or null effects of prediction among L2 speakers (for reviews, see Kaan, Reference Kaan2014; Ito & Pickering, Reference Ito, Pickering, Kaan and Grüter2021; Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021). This can be attributed to the fact that non-native language processing is generally more resource-demanding and non-automatic in some sub-processes, including accessing lexical representations (McDonald, Reference McDonald2006), building syntactic representations (Clahsen & Felser, Reference Clahsen and Felser2006), and determining sentence meanings (MacWhinney & Bates, Reference MacWhinney and Bates1989). Consequently, it is conceivable that, at least during the early stages of L2 processing, there are limited time and resources available for prediction. L2 speakers, for example, face difficulties in using lexical or grammatical features that are reliable for prediction but absent in their L1 (e.g., Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp, Reference Hopp2013, Reference Hopp2016; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; Mitsugi & Macwhinney, Reference Mitsugi and Macwhinney2016). Finally, interference from L1 forms another source of challenge. Given that L2 speakers are often more dominant and proficient in their L1, unidirectional cross-linguistic influence tends to take place from L1 to L2 largely automatically (Karaca, Brouwer, Unsworth, & Huettig, Reference Karaca, Brouwer, Unsworth, Huettig, Kaan and Grüter2021), which delays the pre-activation of lexical representations of L2 words and thus makes prediction less efficient.
In short, several challenging situations limit predictive processing, including those in which processing resources are taxed, the production system is occupied concurrently, and non-native language processing is involved.
Prediction and interpreting
The case of prediction during interpreting is particularly interesting because the interpreting task involves several of the challenges for predictive processing mentioned above. Interpreting is a linguistically and cognitively demanding bilingual experience, in which interpreters must comprehend one language and produce another language under extreme time pressure (Dong & Li, Reference Dong and Li2020; Frauenfelder & Schriefers, Reference Frauenfelder and Schriefers1997). There are two typical interpreting types – namely, consecutive interpreting (CI) and simultaneous interpreting (SI). CI is a two-stage process, where interpreters must comprehend the speech input in the source language first and subsequently produce the output in the target language (Pöchhacker, Reference Pöchhacker, Malmkjær and Windle2011a), with the memory load accumulated before the interpretation has been finished (Liang, Fang, Lv, & Liu, Reference Liang, Fang, Lv and Liu2017). In SI, production is in synchrony with perception and comprehension of language information in the source language (Pöchhacker, Reference Pöchhacker, Malmkjær and Windle2011b). Given the need to divide attention to multiple tasks simultaneously, SI is generally regarded as a more challenging interpreting type.
Considering that prediction may be more limited in situations when cognitive load, time pressure, concurrent production, and additional L2 processing are involved (reviewed in the preceding section), further research is warranted to explore prediction during the challenging circumstances of interpreting. It is noteworthy that traditional interpreting accounts consistently assume an important role of prediction in achieving successful interpreting. This role is usually assumed because of the notion that the potential benefits of prediction during interpreting may motivate interpreters to use it as a practical strategy (Moser, Reference Moser, Gerver and Sinaiko1978; Gerver, Longley, Long, & Lambert, Reference Gerver, Longley, Long and Lambert1984; Setton, Reference Setton, Dam, Engberg and Gerzymisch-Arbogast2005). It remains however the case that time pressure is a big burden for interpreters, especially when they interpret in the simultaneous way. Relying on prediction, interpreters have a chance to maintain a shorter time lag between the onset of input and output, and thus to keep pace with the speaker. On the other hand, cognitive resource constraints form another challenge imposed on interpreters by the task situation, accounting for impaired fluency, numerous errors, omissions and infelicities in interpreting (Gile, Reference Gile2009). In this regard, prediction has the potential of easing the high cognitive load caused by the multiplicity and simultaneity of interpreting (Gile, Reference Gile2009; Seeber & Kerzel, Reference Seeber and Kerzel2011), and is commonly taught as an efficient strategy (Li, Reference Li2015). Recently, Amos and Pickering (Reference Amos and Pickering2020) proposed a theory of prediction in simultaneous interpreting based on a set of psycholinguistic studies on prediction. The authors assume that the prediction-by-production mechanism may underlie prediction in SI, in which interpreters rely on their production system to deploy rapid prediction with semantic, syntactic and phonological representations involved.
From theoretical modelling to empirical testing, a number of studies have been conducted, but few of them provide solid evidence for prediction in interpreting due to limitations of research focus and design. One reason is that the definition of prediction within the framework of interpreting studies tends to be vague (Liontou, Reference Liontou2012). Another reason is that exploring prediction during the processing of the source language is rare (Wilss, Reference Wilss, Gerver and Sinaiko1978; Van Besien, Reference Van Besien1999; Kurz & Färber, Reference Kurz and Färber2003). Some empirical interpreting studies sought to tap predictive processing during comprehension using measures such as latency (van Hell & de Groot, Reference van Hell and de Groot2008; Chmiel, Reference Chmiel2016; Hodzik & Williams, Reference Hodzik and Williams2017; Chmiel, Reference Chmiel2021). Hodzik and Williams (Reference Hodzik and Williams2017), for example, attempted to index prediction using latency measures between the target word in the source language and its equivalent in the target language, showing that target words were interpreted faster in the high (vs. low) constraining condition. However, one must interpret such data as evidence for prediction with caution, because they can also be explained by integration (for extensive discussion, see Pickering & Gambi, Reference Pickering and Gambi2018). To obtain solid evidence for prediction, measuring the pre-activation of linguistic representations is the key, which, however, is difficult to detect using off-line measures.
It is clear that, with prediction defined as the pre-activation of linguistic information, more appropriate on-line methods are needed to examine prediction in interpreting. Several studies based on eye-tracking data indicated that experience with interpreting and code-switching could help bilinguals predict target linguistic units based on grammatical gender (Valdés Kroff, Dussias, Gerfen, Perrotti, & Bajo, Reference Valdés Kroff, Dussias, Gerfen, Perrotti and Bajo2017) and morphological cues (Lozano-Argüelles, Sagarra, & Casillas, Reference Lozano-Argüelles, Sagarra and Casillas2019). Although these empirical efforts strengthen the grounds of prediction in interpreting, the current evidence for the important role of prediction is scarce. Up to now, direct evidence that prediction occurs during interpretation appears to be limited to a recent PhD dissertation by Amos (Reference Amos2020), who, using the visual world paradigm; found that prediction often takes place in both SI (interpreting from L2 English to L1 French) and CI (interpreting from L2 English to L1 Dutch).
The current study
Here, we sought to test the limits of prediction by observing prediction in two challenging tasks, i.e., consecutive and simultaneous interpreting. The former setting involves a production process following a comprehension process, while in the latter setting comprehension and production overlap. By doing so, we can test whether different task settings affect prediction. We focused on the prediction of the L1 source language during interpreting because we were particularly interested in prediction in challenging situations. Investigating prediction of the source language allowed us to compare our results to previous results during L1 listening without any interpreting tasks (Hintz, Meyer, & Huettig, Reference Hintz, Meyer and Huettig2017).
To this end, we conducted two visual world eye-tracking experiments. The eye-tracking method allowed us to measure semantic prediction in speech processing of the source sentences unequivocally – that is, before participants heard the anticipated target. Participants’ anticipatory eye movements to the target objects in the predictable and non-predictable condition were recorded while they engaged in two Dutch–English interpreting tasks. In Experiment 1, participants were asked to interpret in a consecutive way while they were looking at co-present visual objects. In Experiment 2, a different set of participants was asked to interpret simultaneously in the more demanding interpreting task. Experiment 1 and Experiment 2 were carried out in parallel. Participants were from a homogenous undergraduate student population.
Notably, we used the same manipulation of verb-noun predictability, participant population, as well as spoken and visual stimuli from Hintz et al. (Reference Hintz, Meyer and Huettig2017), not only because their manipulation and stimuli have been shown to elicit robust anticipatory eye movements with a large effect size in Dutch L1 processing, but also to directly compare prediction in various tasks in different kinds of “challenging situation”: a) mere comprehension (Hintz et al., Reference Hintz, Meyer and Huettig2017); b) consecutive interpreting; c) simultaneous interpreting. The interpreting direction from L1 Dutch to L2 English was chosen partly due to the same consideration — to enable comparison with the earlier study. But also, the L1-L2 interpreting direction is a common practice, especially on national markets (Denissenko, Reference Denissenko, Gran and Dodds1989; Lim, Reference Lim2005; Chmiel, Reference Chmiel2016, Reference Chmiel2021), although the reverse L2-L1 direction is more widely used, especially for international organizations like UN (Donovan, Reference Donovan2004; Pavlović, Reference Pavlović2007; Nicodemus & Emmorey, Reference Nicodemus and Emmorey2013) and favored by interpreting studies on prediction (Hodzik & Williams, Reference Hodzik and Williams2017; Amos, Reference Amos2020). The current study thus also complements prior interpreting studies by focusing on a less frequently tested interpreting direction.
Experiment 1
Participants
Thirty-three participants from the participant pool of the Max Planck Institute for Psycholinguistics were paid for their participation. The data from thirty participants (24 females; mean age = 22.03, SD = 2.06) were used for analysis (for data exclusion, see the Results and interim discussion section below). All of them were students at Radboud University, with Dutch as their native language. They all reported to use English frequently. On average, the participants had started learning English at the age of 10 (M = 9.70, SD = 2.22). English language television programs are typically not dubbed in the Netherlands and thus daily English language exposure is a normal part of Dutch life.
All participants had normal or corrected-to-normal vision as well as normal hearing. All participants gave informed written consent. Ethical approval to conduct the study was provided by the ethics board of the Social Sciences faculty at Radboud University.
In order to assess participants’ English proficiency, participants were asked to rate their own level of English proficiency in terms of reading, speaking, writing and understanding spoken language, using a Likert-type scale (1 = very low, 7 = very comfortable). They considered themselves highly proficient in English (reading: M = 6.20, SD = 1.01; speaking: M = 5.33 SD = 1.32; writing: M = 5.43, SD = 1.36; understanding spoken language: M = 6.23, SD = 0.80). Furthermore, we administered the (English) National Adult Reading Test (NART) and the English version of Peabody Picture Vocabulary Test to assess their reading skills and receptive vocabulary size in English. These tests were carried out after the eye-tracking experiment.
National Adult Reading Test
The National Adult Reading Test comprises 50 written words in British English, which have irregular pronunciation. The NART was developed by Nelson (Reference Nelson1982) and, along with its American English version (Blair & Spreen, Reference Blair and Spreen1989), is widely used as a measure of premorbid intelligence levels of English-speaking patients with dementia. More importantly for the present purposes, NART performance highly correlates with adults’ reading and verbal comprehension skills (Bright, Hale, Gooch, Myhill, & van der Linde, Reference Bright, Hale, Gooch, Myhill and van der Linde2018). Thus, the English version of NART was used to assess participants’ verbal comprehension skills in L2, which is an important component of general English proficiency.
Participants were told to read the 50 words slowly and aloud, and they were encouraged to guess the pronunciation of words they were unfamiliar with. They were allowed to correct their responses and the test was untimed. Following the NART scoring guidelines, a score for each participant was calculated based on the number of errors using the following formula:
Peabody Picture Vocabulary Test
The Peabody Picture Vocabulary Test was developed by Dunn and Dunn (Reference Dunn and Dunn1997) and has been used widely to measure receptive vocabulary size (also for participants in visual world eye-tracking experiments, e.g., Borovsky, Elman, & Fernald, Reference Borovsky, Elman and Fernald2012; Rommers, Meyer, & Huettig, Reference Rommers, Meyer and Huettig2015; Hintz et al., Reference Hintz, Meyer and Huettig2017). A digitized version of the English Peabody test was used in the current study to assess participants’ L2 lexical ability, which is another important component of L2 proficiency. Following the standard protocol of the test, on each trial, participants heard a word and saw four numbered pictures. Participants were asked to give the number (1, 2, 3, or 4) that corresponded to the correct picture indicated by the spoken word. Trials were presented in blocks of 12 increasing in difficulty. The test ended if fewer than five correct responses were provided within the current block. Participants’ score was the number of the last item they saw minus the number of errors made. Since we tested non-native Dutch speakers of English, we did not apply the age-sensitive transformation procedure as described in the test manual since the population norms were based on native English individuals.
The raw scores of both NART and Peabody test were used for analysis, with their descriptive results shown in Table 1a. The results of the two language proficiency tests as well as self-rating scores correlated with each other positively and robustly, see Table 1b. The overall results reveal that the participants were highly proficient in L2 English.
Notes: *0.01 < p < 0.05, **0.001 < p < 0.01, ***p < 0.001
Stimuli
The same Dutch sentence recordings and visual displays as in Hintz et al. (Reference Hintz, Meyer and Huettig2017) were used. The materials consisted of 40 target nouns and 80 verbs used in the sentence “The man (verb) at this moment a (noun)”. The adverbial phrase “at this moment” separated verb and noun, and was included to give participants enough opportunity to engage in predictive language processing. The stimuli sentences lasted, on average, 2483ms. The resulting sentence construction is deemed quite natural by native speakers of Dutch. Each target noun appeared in two versions, as a predictable and as a nonpredictable item depending on the verb preceding it (e.g., “De man schilt/tekent op dit moment een appel”, the man peels/draws at this moment an apple, see Appendix A, for all items). Each target noun was paired with a set of four objects, one of which being a depiction of the target noun, the other three being unrelated distractors (Figure 1, for an example).
To evaluate whether predictable and nonpredictable sentences were classified properly, Hintz et al. (Reference Hintz, Meyer and Huettig2017) had pretested all sentences for cloze probability according to Taylor (Reference Taylor1953). In the predictable condition, the mean cloze probability of the target nouns was .39 (SD = .24; ranging from .06 to .8); in the nonpredictable condition, it was zero. In addition, a series of pretests assessing the verb-noun relationship was conducted, including free association strength, plausibility, typicality rating (for more details, see Hintz et al., Reference Hintz, Meyer and Huettig2017).
Procedure
The eye-tracking experiment consisted of 80 experimental items (40 target nouns presented in predictable and nonpredictable conditions) in total. Predictable and nonpredictable items were evenly distributed across two lists such that the same target noun did not appear twice on one list. Specifically, each list contained all the target nouns (40), with half of them (20) paired with a predictable verb and the other half (20) paired with a nonpredictable verb. Participants were randomly assigned to one list and sat in a sound-shielded room. Eye movements were tracked using an Eye-link 1000 Tower Mount (SR Research) sampling at 1000 Hz.
After successful calibration of the eye-tracker, participants received the task instruction: they were told to listen to the sentences carefully and interpret them from Dutch to English. Importantly, participants were instructed to interpret in a consecutive fashion. That is, they should listen to a given sentence first and start producing the translated sentence after the spoken sentence had ended. In line with previous studies, no explicit instruction was given as to where they should look on the visual display (i.e., a look-and-listen task, for further discussion, see Huettig, Rommers, & Meyer, Reference Huettig, Rommers and Meyer2011).
Each trial began with a central fixation dot presented for two seconds. After the dot disappeared, a picture consisting of 4 objects was displayed and then the playback of the sentence started. The presentation of the visual displays was timed to precede the onset of the spoken verb by one second to provide sufficient time to preview all four objects. The position of the four objects was random on a (virtual) 2 × 2 grid (Figure 1, for an example). A beep marked the end of the spoken sentence and indicated to participants that they could initiate their interpretation. The visual display of four objects remained in view until the end of the trial, see Figure 2. Each participant was presented with all 40 trials on one list (20 trials for predictable and the other 20 trials for nonpredictable condition). The order of trials was randomized automatically before the experiment. The eye-tracking experiment, including calibration and validation, took approximately 10 min.
Data Analysis
Four areas of interest (200 × 200 pixels) were defined for the four objects in the display. Using the algorithm provided by the EyeLink software, eye gaze was analyzed in terms of fixations directed to the target object or to one of the three unrelated distractors, or elsewhere. We plotted participants’ fixation proportions for each object (target, distractors) and each condition (predictable, nonpredictable) during the whole interpreting process (Figure 3), spanning 2.5 seconds before and 5 seconds after target word onset. This period captured both comprehension and production processes.
A magnitude estimation approach was used for data analysis. This was in line with the ‘new statistics’ approach (Cumming, Reference Cumming2014), which advocates a change from null-hypothesis testing to interpreting results by using measures of effect sizes and confidence intervals. Empirically evidenced by Fidler and Loftus (Reference Fidler and Loftus2009), reporting confidence intervals leads to a better interpretation of results than that based on null hypothesis testing (for extensive discussion, see Cumming, Reference Cumming2012; Cumming, Reference Cumming2014). We reported the mean fixation proportions accompanied by by-participant confidence interval (95%, area shaded in gray), see Figure 3. As in previous studies (e.g., Huettig & Janse, Reference Huettig and Janse2016; Hintz et al., Reference Hintz, Meyer and Huettig2017; Huettig & Guerra, Reference Huettig and Guerra2019), in doing so we provide a detailed graphical description of eye movements over time in each experimental condition.
Results and interim discussion
The data of three participants were excluded because they did not fixate any displayed object on more than 25% of trials. The recordings of participants’ interpreted sentences were scored for accuracy and transcribed using Praat (Boersma, Reference Boersma2001). Interpreting outputs were scored as correct if they were identical to our translation of the target sentence or when a semantically similar verb and/or noun was used. In Experiment 1, the overall accuracy of interpreting was 87.58% (SD = 11.38) and 12.42% of data (incorrect translations) were excluded from further analyses.Footnote 1
Participants completed interpretation earlier (but not statistically significant, t = 0.93, p = .357, d = 0.24, 95% CI [-298, 67]) in the predictable condition (M = 6571ms, SD = 465ms) than the nonpredictable condition (M = 6686ms, SD = 492ms). With regard to cross-condition accuracy, the accuracy of interpretation in the predictable condition (M = 84%, SD = 12%) was significantly lower than that in the nonpredictable condition (M = 91%, SD = 9%), t = 2.47, p = .017, d = 0.67, 95% CI [-0.13, -0.01].
Figure 3 presents the fixation proportions for Experiment 1: fixations to the target (solid lines) and to the averaged distractors (dashed lines) over time for the predictable (green) and nonpredictable (red) condition. The shaded grey areas surrounding the lines represent by-participant 95% confidence intervals (Huettig & Janse, Reference Huettig and Janse2016; Hintz et al., Reference Hintz, Meyer and Huettig2017; Huettig & Guerra, Reference Huettig and Guerra2019). Figure 3 covers a period of 7500 ms time course, with time zero indicating the acoustic onset of the spoken target. Consistent with the task instruction, comprehension and production happened sequentially.
In the predictable condition, the likelihood of looking at the target object increased well before it was mentioned, at around one second before target word onset. In contrast, in the nonpredictable condition, participants only looked at the same objects after they were referred to in the speech signal, starting 200 ms after target word onset, which is the time needed to launch a saccadic eye movement (Saslow, Reference Saslow1967). In spite of the differences in fixations prior to target word onset, the eye-movement patterns after target word onset looked very similar in predictable and nonpredictable conditions: fixations to the target objects dropped slightly after the offset of the spoken sentence but increased again after participants had started producing their interpretation.
Above all, the results demonstrate clear evidence for predictive processing in a consecutive interpreting task. In Experiment 2, participants were required to do a more demanding interpreting task (i.e., simultaneous interpreting). That is, Experiment 2 was identical to Experiment 1, except that participants were instructed to interpret the spoken sentence while they were still listening to it.
Experiment 2
Participants
Another forty-one participants from the participant pool of the Max Planck Institute for Psycholinguistics were recruited for Experiment 2 and were paid for their participation. The data of thirty participants (23 females; mean age = 23.87, SD = 3.68) were used for analysis (for data exclusion, see the Results and interim discussion section below). They were again all students from Radboud University, with Dutch as native language and English as frequently used foreign language. As with the participants in Experiment 1, they also started learning English at the age of 10 around (mean age = 9.87, SD = 1.87). All participants had normal or corrected-to-normal vision as well as normal hearing. All participants gave informed written consent before taking part in the experiment. Ethical approval to conduct the study was provided by the ethics board of the faculty of Social Sciences at Radboud University.
The same self-report as in Experiment 1 was administered and showed that participants self-rated themselves a high level of English proficiency (reading: M = 6.33, SD = 0.69; speaking: M = 5.50, SD = 1.06; writing: M = 5.40, SD = 1.05; understanding spoken language: M = 6.17, SD = 0.78). The results of NART and Peabody test are summarized in Table 2a. The correlations between self-rating scores, NART score, and Peabody score were calculated; the correlations showed that the scores robustly and positively correlated with each other (except three pairs: NART-speaking, NART-understanding, and Peabody-understanding), see Table 2b. The overall results reveal that the participants were proficient bilinguals of English.
Notes: *0.01 < p < 0.05, **0.001 < p < 0.01, ***p < 0.001
To note, participants in Experiment 1 and Experiment 2 had comparable levels of English proficiency in terms of all measures in the study (reading: t = 0.58, p = .562, d = 0.16, 95% CI [-0.59, 0.33]; speaking: t = 0.53, p = .599, d = 0.14, 95% CI [-0.80, 0.46]; writing: t = 0.10, p = .917, d = 0.03, 95% CI [-0.61, 0.67]; understanding spoken language: t = 0.32, p = .749, d = 0.08, 95% CI [-0.35, 0.48]; NART score: t = 0.66, p = .509, d = 0.18, 95% CI [-4.74, 2.38]; Peabody score: t = 0.92, p = .359, d = 0.25, 95% CI [-11.51, 4.24]).
Stimuli, procedure and data analysis
Stimuli, procedure and data analysis were the same as in Experiment 1, except that participants were instructed to interpret the sentences in a simultaneous rather than consecutive fashion, see Figure 4 for the procedure. To that end, participants were asked before the experiment to initiate their interpretation as soon as possible. Additionally, we implemented an auditory beep to occur two seconds after the end of the spoken sentence and told participants that their interpretation should be finished before the beep. Pretests had shown that this setting was feasible.
Results and interim discussion
Among the forty-one participants, seven participants did not look at any displayed object on more than 25% trials, while another four participants always focused on one or two fixed positions on the screen. According to their post-experiment verbal report, these eleven participants just focused on the interpreting task without viewing the displayed objects. We excluded these participants’ data as they had engaged in a specific form of strategic processing. In addition, the accuracy of interpreting was 86.58% (SD = 11.16%) so that 13.42% of data (incorrect translations) were excluded from further analyses.
Participants showed a similar efficiency-accuracy offset as in Experiment 1. That is, they completed interpretation earlier (but not statistically significant, t = 1.26, p = .212, d = 0.33, 95% CI [-298, 67]) in the predictable condition (M = 4346ms, SD = 365ms) than the nonpredictable condition (M = 4461ms, SD = 341ms). With regard to cross-condition accuracy, the accuracy of interpretation in the predictable condition (M = 81%, SD = 11%) was significantly lower than that in the nonpredictable condition (M = 92%, SD = 9%), t = 4.32, p < .001, d = 0.67, 95% CI [-0.16, -0.06].
Figure 5 plots participants’ fixation behavior during the SI process. As can be seen in Figure 5, participants initiated their interpretation approximately 1080ms after the onset of the spoken Dutch sentence, and 1403ms before the offset of spoken sentence. Very similar to Experiment 1, in the predictable condition, the likelihood of looks to the target increased shortly after participants had heard the verb in the spoken sentence, about one second prior to target word onset. Similarly, the time course of fixations to the target object in the nonpredictable condition was comparable to that in Experiment 1: compared to the unrelated distractors, more looks to the target were made about 200 ms after target word onset. Thus, the results suggest that participants predicted the upcoming target noun shortly after hearing the verb, while they had already started interpreting the spoken sentence.
Cross-task analyses
Comparison of Experiment 1 and Experiment 2
To complement our magnitude estimation analysis approach and to quantify differences in eye gaze behavior between predictable and nonpredictable conditions across Experiment 1 and 2, we additionally fitted a linear mixed-effects model using the lme4 package (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2015) in R (R Development Core Team, 2012). To this end, fixation proportions during the onset verb-onset target period (i.e., prediction window) of the spoken Dutch sentences (200 ms were added to both onsets to account for the time it takes to program and launch a saccadic eye movement, Saslow, Reference Saslow1967) were extracted. To calculate the dependent variable, we divided each participant's proportion of looks to the target during the prediction window on a given trial by that participant's proportion of looks to the averaged distractors during the same time window. The resulting values were log-transformed. Prior to the division and log-transformation fixation proportions of 0 or 1 were replaced with 0.01 and 0.99, respectively (cf. Macmillan & Creelman, Reference Macmillan and Creelman1991).
The model included Experiment (1 vs. 2) and Condition (predictable vs. nonpredictable) as fixed factors. Experiment 1 and the nonpredictable condition, respectively, were mapped onto the intercept (i.e., using treatment-/dummy-codingFootnote 2). To test for a potential influence of participants’ English reading skills and their English receptive vocabulary size on eye movements, NART and PPVT scores (both scaled and centered) were added as continuous predictors. Participants and Items were added as random factors, both with random intercepts. Using a maximal random effects structure (with random slopes for Condition by Participants and Items and random slopes for Experiment by Item) resulted in ‘model singularity’. We systematically simplified the random effects structure until the error did no longer occur. The formula of the final model was: targetpref ~ Exp * Cond * (PPVT_cs + NART_cs) + (1|Participant) + (1 + Cond|Item), data = data, control = lmerControl(optimizer = “bobyqa”).
This model revealed a significant effect of Condition (β = .82, SEβ = .15, t = 5.63), suggesting that target objects were looked at significantly more during the predictive period than the distractors in the predictable but not the nonpredictable condition in Experiment 1. None of the other factors, predictors or interactions reached statistical significance (see Table 3). Specifically, the lack of a significant interaction between Experiment and Condition demonstrates that gaze during the predictive window did not differ across both experiments. The same model without the fixed factor Condition provided a significantly worse fit to the data (χ2(6) = 43.79, p < .001).
In sum, the complementary mixed-effects modelling analysis suggests that the prediction effects in Experiment 1 and 2 were very similar and that neither NART nor PPVT scores contributed to explaining variance in participants’ gaze behavior. It is noteworthy that 11 of the 41 participants in Experiment 2 did not move their eyes during the trials. Only 3 of 33 participants did not move their eyes in Experiment 1. We will discuss the relevance of this observation in the General Discussion.
Comparison of current study with Hintz et al. (Reference Hintz, Meyer and Huettig2017)
Finally, given the similarity of the present experiments with our previous study (i.e., same materials, L1 input, participants sampled from the same population), we conducted an additional analysis comparing eye gaze across interpreting and comprehension tasks. That is, we assessed whether having an interpreting task, either in a consecutive or simultaneous fashion, leads to differences in fixation behavior, compared to when participants merely comprehend the spoken sentences (i.e., look and listen, Hintz et al., Reference Hintz, Meyer and Huettig2017). To that end, we incorporated the data from Experiment 1 from Hintz et al. (Reference Hintz, Meyer and Huettig2017) in the analysis described above. The model structure was identical, except that PPVT and NART were dropped and that the fixed factor Experiment had three levels, with ‘Hintz2017’ mapped onto the intercept (using treatment-/dummy-codingFootnote 3). As before, the maximal random effects structure yielded ‘model singularity’. We therefore simplified the model until the error no longer occurred. The final model had the following structure: targetpref ~ Exp * Cond + (1 + Cond|Participant) + (1|Item), data = data, control = lmerControl(optimizer = “bobyqa”). Table 4 summarizes the results of this analysis. While Condition – as in the previous model – contributed significantly to explaining variance in eye gaze in the three experiments (larger preference for the target over the unrelated distractors in the predictable than in the non-predictable condition, β =.98, SEβ = .10, t = 9.68 in our previous experiment, Hintz et al., Reference Hintz, Meyer and Huettig2017), none of the other predictors showed a significant effect. In particular, none of the interactions showed even a trend towards an effect suggesting that gaze during the predictive window did not differ across the three experiments.
In sum, this analysis suggests that the presence of an interpreting task, where listeners comprehend speech in their native language and translate it into an L2, does not modulate (predictive) fixation behavior as compared to a setting where participants merely comprehend speech in their L1.
General discussion
We investigated the limits of prediction by asking native Dutch speakers, who were also proficient L2 speakers of English, to translate Dutch sentences into their English counterparts during consecutive and simultaneous interpreting. To this end, we conducted two visual-world eye-tracking experiments, in which participants viewed a visual display consisting of four objects (one target and three distractors) while interpreting simple Dutch sentences into English. In both experiments, the main manipulation was the predictability of spoken sentences. On hearing the predictable sentences, in the predictable condition it was possible for participants to use the semantic information of verbs to predict the upcoming target nouns (e.g., “The man peels the apple”), whereas the target nouns were not predictable in the nonpredictable sentences (e.g., “The man draws the apple”).
In Experiment 1, participants were asked to engage in a consecutive interpreting task – that is, they were asked to comprehend the speech inputs first and render the interpretation after the offset of spoken sentences. The results of Experiment 1 show that the participants fixated the targets before they were mentioned in the predictable condition, but such predictive looks to the targets were not observed in the nonpredictable condition. The bilingual participants of Experiment 1 thus showed anticipatory eye movements to semantically-related upcoming target words in the source language when concurrently planning consecutive interpretation.
Experiment 2 was conducted to examine whether prediction of the source language in novice interpreting can also routinely occur in a more difficult kind of interpreting task – namely, simultaneous interpreting. Participants in Experiment 2 were required to interpret heard sentences in the simultaneous way, with comprehension and production happening nearly concurrently. The participants of Experiment 2 exhibited anticipatory eye movements to semantically-related upcoming target words in the source language when engaging in simultaneous interpretation.
The present findings thus suggest that proficient L2 speakers can engage in prediction in their L1 despite the adverse conditions imposed by an interpreting task on prediction, including cognitive load, time pressure, L2 processing and concurrent (or subsequent) production. These overall results could also be taken to support the notion that prediction of upcoming semantically-related words in the source language is advantageous for interpreting in both types of interpreting (consistent with recent findings by Amos, Reference Amos2020, but with a focus on the L1-L2 interpreting direction). We note however that accuracy rates in both Experiment 1 and Experiment 2 in the predictable condition were significantly lower than in the nonpredictable condition. This raises the possibility that prediction, in certain situations, may be harmful, or at least is not beneficial (cf. Frisson, Harvey, & Staub, Reference Frisson, Harvey and Staub2017; Huettig & Mani, Reference Huettig and Mani2016; Luke & Christianson, Reference Luke and Christianson2016). Further research could usefully investigate this possibility.
Too taxing to predict or too taxing to move the eyes?
It is important to point out at this junction that in Experiment 2 when participants got involved in a simultaneous interpreting task, more participants (11 of 41, about every 4th participant) than in Experiment 1 (3 of 33, about every 10th participant) chose not to move their eyes and view the displayed objects. What does this difference mean? Is it the enhanced cognitive burden of the simultaneous interpreting task that caused this difference? Does it mean that prediction is not taking place at all in these cases, or is it that the predictive processing is not manifesting in eye movement behavior as measured through the visual world paradigm? We cannot be sure about the correct answer from the present data but there are hints in the previous literature that warrant a little speculation. To not move one's eyes in a visual world task is very unusual behavior. Visual-world eye-tracking behavior is a reflection of the tight connection between spoken language processing and visual processing that has been established in a great number of studies (for reviews, see Huettig et al., Reference Huettig, Rommers and Meyer2011; Magnuson, Reference Magnuson2019). When participants hear a word that refers (directly or in an anticipatory fashion) to a visual object in their concurrent visual environment they quickly and semi-automatically (see Mishra, Olivers, & Huettig, Reference Mishra, Olivers, Huettig, Pammi and Srinivasan2013, for extensive discussion) direct their eye gaze to objects which are similar (e.g., semantically) to the heard word. Indeed all participants in Hintz et al. (Reference Hintz, Meyer and Huettig2017) showed this typical eye movement behavior. We speculate here that it is thus likely that the 25% of participants in the simultaneous interpreting task did not move their eyes because of the extreme cognitive burden of this version of the interpreting task. What this means with regard to prediction in these 25% of people is unclear. It is possible that these 25% of participants did not predict semantically-related upcoming target words in the source language in simultaneous interpreting. This could be due to the higher-level complexity of SI relative to CI, with the former one featuring high degrees of multiplicity and simultaneity. We believe that future work on this particular issue would be particularly useful and informative. If it turns out that these 25% of people did not predict or show substantially reduced prediction, then this would suggest that there are important limits to prediction (cf. Huettig & Mani, Reference Huettig and Mani2016; Huettig & Guerra, Reference Huettig and Guerra2019) during simultaneous interpreting given the relatively easy sentences participants translated in the current study. The present data cannot reveal whether this interpretation is correct and additional research is needed to explore this account. What our data do reveal however is that the 75% of participants who engaged in the typical semi-automatic visual world eye gaze behavior showed clear evidence of prediction of the source language also in simultaneous interpreting. Thus, for the vast majority of highly proficient bilinguals the present results suggest that prediction does not break down during interpreting even in a very challenging task such as simultaneous interpreting.
Prediction and production
It is noteworthy that the settings of the two experiments here were very similar to the experiments reported in Hintz et al. (Reference Hintz, Meyer and Huettig2017) except for the addition of a production phase (CI task in Experiment 1) and the concurrent execution of comprehension and production (SI task in Experiment 2). Doing so fueled our motivation to compare prediction in various tasks with different challenging levels, providing another novel contribution of the present study. With the direct involvement of production processes, the current findings are consistent with accounts of prediction in language processing which assume a role of production system during comprehension (Federmeier, Reference Federmeier2007; Pickering & Garrod, Reference Pickering and Garrod2013; Dell & Chang, Reference Dell and Chang2014; Huettig, Reference Huettig2015; Pickering & Gambi, Reference Pickering and Gambi2018). However, different from relevant studies demonstrating reduced prediction when the prediction system was ‘occupied’ (Martin, Branzi, & Bar, Reference Martin, Branzi and Bar2018), or boosted prediction in the case of increased engagement of the production system (Hintz et al., Reference Hintz, Meyer and Huettig2017; Rommers et al., Reference Rommers, Dell and Benjamin2020; Lelonkiewicz et al., Reference Lelonkiewicz, Rabagliati and Pickering2021), the current study showed null effects of the addition of a production phase on anticipatory eye gaze.
Adaptive prediction
It is also surprising that the prediction effects were similar between mere comprehension and interpreting tasks and across different interpreting tasks (at least in 75% of the participants) given different cognitive challenges and processing mechanisms involved in them (Liang et al., Reference Liang, Fang, Lv and Liu2017; Liang, Lv, & Liu, Reference Liang, Lv and Liu2019; Jia & Liang, Reference Jia and Liang2020). It is conceivable that such surprising results can be attributed to the variability of prediction being the result of not only the passive constraints of various mediating factors but also the ‘active’ adaptability of language users involved. Kuperberg and Jaeger (Reference Kuperberg and Jaeger2016), for example, have put forward a ‘utility view of prediction’: language users dynamically adjust their predictive behavior by weighting the costs and benefits of prediction for achieving their communicative goals. Looking back to the current findings, participants faced both extra challenges (cognitive load, time pressure, concurrent production and cross-language processing) as well as benefits (to relieve cognitive burden and to deal with intense time pressure) related to prediction during interpreting tasks. How such a potential ‘cost-benefit analysis’ plays out in specific communicative situations as well as on a mechanistic level is another interesting implication and challenge from the present study for further work.
Ecological considerations
Finally, we note that in the present study we chose a strongly supportive task environment: verb-based semantic prediction in a visual world context. Semantic prediction effects are typically much larger than syntactic or phonological prediction effects in native language processing (but see Ferreira & Qiu, Reference Ferreira and Qiu2021). Furthermore, the language stimuli to be interpreted in the present study were relatively simple compared with those in real interpreting situations. Future work is now in a good position to move on to explore prediction in interpreting using various non-semantic cues as well as the kind of sentences and phrases that are used in actual real world interpreting situations.
Data availability
The data that support the findings of this study are openly available in OSF at https://osf.io/54zup, DOI: 10.17605/OSF.IO/54ZUP.
Acknowledgments
This study was carried out while Yiguang Liu was a visiting PhD student at the Max Planck Institute for Psycholinguistics on invitation of F. Huettig. Yiguang Liu was also supported by a Chinese Scholarship Council Grant (No 201906320103). We thank Theres Grüter and two anonymous reviewers for their comments on a previous version of this manuscript.
Appendix A. Stimulus materials