How conceptualizing influences fluency in first and second language speech production

EMILY R. FELKER; HEIDI E. KLOCKMANN; NIVJA H. DE JONG

doi:10.1017/S0142716418000474

How conceptualizing influences fluency in first and second language speech production

Published online by Cambridge University Press: 06 November 2018

EMILY R. FELKER ,

HEIDI E. KLOCKMANN and

NIVJA H. DE JONG

Show author details

EMILY R. FELKER: Affiliation:
Radboud University, Nijmegen; and International Max Planck Research School for Language Sciences
HEIDI E. KLOCKMANN: Affiliation:
Goethe-Universität Frankfurt
NIVJA H. DE JONG*: Affiliation:
Leiden University
*: ADDRESS FOR CORRESPONDENCE Nivja H. De Jong, Leiden University Centre for Linguistics, Faculteit der Geesteswetenschappen, Leiden University, P.N. van Eyckhof 3, 2311 BV Leiden. E-mail: n.h.de.jong@hum.leidenuniv.nl

Article contents

Abstract
FLUENCY IN RELATION TO L1 AND L2 SPEECH PRODUCTION MODELS
HOW CONCEPTUALIZING DIFFICULTY INFLUENCES FLUENCY
RESEARCH QUESTIONS AND HYPOTHESES
OVERVIEW OF THE PRESENT STUDY
EXPERIMENT 1: APPEARING PATHS
EXPERIMENT 2: CHANGING PATHS
General Discussion
Footnotes
References

Rights & Permissions

Abstract

When speaking in any language, speakers must conceptualize what they want to say before they can formulate and articulate their message. We present two experiments employing a novel experimental paradigm in which the formulating and articulating stages of speech production were kept identical across conditions of differing conceptualizing difficulty. We tracked the effect of difficulty in conceptualizing during the generation of speech (Experiment 1) and during the abandonment and regeneration of speech (Experiment 2) on speaking fluency by Dutch native speakers in their first (L1) and second (L2) language (English). The results showed that abandoning and especially regenerating a speech plan taxes the speaker, leading to disfluencies. For most fluency measures, the increases in disfluency were similar across L1 and L2. However, a significant interaction revealed that abandoning and regenerating a speech plan increases the time needed to solve conceptual difficulties while speaking in the L2 to a greater degree than in the L1. This finding supports theories in which cognitive resources for conceptualizing are shared with those used for later stages of speech planning. Furthermore, a practical implication for language assessment is that increasing the conceptual difficulty of speaking tasks should be considered with caution.

Keywords

conceptualizing disfluencies fluency second language acquisition speech production

Type: Original Article
Information: Applied Psycholinguistics , Volume 40 , Issue 1 , January 2019 , pp. 111 - 136

DOI: https://doi.org/10.1017/S0142716418000474 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © Cambridge University Press 2018

While people can generally communicate through speech successfully, the processes underlying speech production are not always smooth and effortless. As a result, the stream of words people produce is typically punctuated by a range of different disfluencies, such as short pauses, stuttered repetitions and repairs, and filler words like uh and um. These disfluencies sometimes arise because speakers are having trouble figuring out how exactly to formulate or articulate an utterance. Second-language (L2) learners know well how the flow of their speech can be disrupted when they know what they want to communicate but struggle to express their intended message using the less familiar grammar and vocabulary of their L2. Still, even when speaking in their first language (L1), people often hesitate because they are trying to decide what it is that they want to convey in the first place. In this case, the problem lies not with linguistic encoding but with conceptualizing, or generating the content of a message for speech (Levelt, Reference Levelt1989).

Though the causes of particular types of disfluencies in speech are still not fully understood, some researchers (e.g., Fraundorf & Watson, Reference Fraundorf and Watson2013) have proposed that they may reflect problems at different levels of processing in speech production. This would mean that patterns of disfluency caused by linguistic encoding difficulties, such as performing syntactic operations or retrieving words from the mental lexicon, may differ from disfluency patterns linked to conceptual planning difficulties, such as deciding on the content of a message. Some recent studies have experimentally manipulated difficulty at one specific language production stage, such as lexical access (Hartsuiker & Notebaert, Reference Hartsuiker and Notebaert2010) or morphosyntactic encoding (Mirdamadi & De Jong, Reference Mirdamadi and De Jong2015), to determine if processing difficulties at different production stages lead to distinct patterns of disfluency. However, the link between specific disfluency patterns and conceptual planning has not yet been fully established, as no studies to date have sufficiently isolated conceptualizing from the later speech production stages. Moreover, it is not yet known if conceptualizing difficulty has the same influence on fluency in L1 and L2, given that speaking in an L2 already places higher demands on attentional and processing resources (Kormos, Reference Kormos2006). By exploring the relationship between conceptualizing and fluency in both L1 and L2 speech production, the present study aims to clarify how conceptual difficulty in the very first stage of speech planning impacts the fluency of speech output in two cases: when the subsequent stages of linguistic encoding are relatively fast and automatic (as in L1) and when they are slower and more effortful (as in L2). This information may shed light on the extent to which the conceptualizing and the later linguistic encoding stages in speech production draw on common cognitive resources, which would in turn have both theoretical implications for L2 speech processing and practical implications for the use of fluency measures in L2 proficiency tests.

FLUENCY IN RELATION TO L1 AND L2 SPEECH PRODUCTION MODELS

One of the most comprehensive psycholinguistic models of speech production is the blueprint of the speaker developed by Levelt (Reference Levelt1989, Reference Levelt1999) and Levelt, Roelofs, and Meyer (Reference Levelt, Roelofs and Meyer1999). In this model, information flows forward incrementally through a series of processing stages that are grouped into three modules: the conceptualizer, the formulator, and the articulator. The conceptualizer generates a preverbal message through the two steps of macroplanning and microplanning. In macroplanning, the speaker selects and orders the information to be expressed that will satisfy a particular communicative intention. In microplanning, the preverbal message is further specified for focus and perspective, semantic relations, and conceptual features that are obligatorily expressed in the language being used. The conceptualizer’s output enters the formulator, where the appropriate lemmas from the mental lexicon are activated and placed into a syntactic surface structure through the process of grammatical encoding. The formulator also carries out morphophonological and phonetic encoding. When the articulator executes the phonetic plan, overt speech is produced. These basic steps of activating concepts, retrieving linguistic forms, and articulating speech are also central to connectionist models of speech production, such as those of Dell, Schwartz, Martin, Saffran, and Gagnon (Reference Dell, Schwartz, Martin, Saffran and Gagnon1997).

Both Levelt’s blueprint of the speaker and Dell’s connectionist models can account for disfluencies in L1 speech production in several ways. According to Levelt’s model, problems in both inner speech and overt speech can be detected by the self-monitor via perceptual feedback loops. In connectionist models (e.g., Nozari, Dell, & Schwartz Reference Nozari, Dell and Schwartz2011), error detection occurs not through a comprehension-based monitor but rather through a process of conflict monitoring by a domain-general executive center. Regardless of how errors are detected and corrected, both types of models would predict that the subsequent replanning of speech requires additional processing time, potentially leading to pausing. Error detection and correction are not the only source of disfluencies, however. Disfluencies can also result from processing difficulties at any point in the speech production process when one step takes too much time and the subsequent step is consequently delayed.

Speaking in a second language is typically more challenging than speaking in one’s native language, in large part due to incomplete linguistic knowledge of the L2 as well as having to inhibit the L1. According to Kormos (Reference Kormos2006), the linguistic encoding processes of formulating and articulating in a second language are less automatic and often require conscious effort and attentional control, which leads them to run serially rather than in parallel. This contrasts with L1 speech production, where conscious attention and control are usually only required for conceptualizing. Furthermore, L2 speech is often characterized by more disfluencies than L1 speech (e.g., De Jong, Groenhout, Schoonen, & Hulstijn, Reference De Jong, Groenhout, Schoonen and Hulstijn2015; Derwing, Munro, Thomson, & Rossiter, Reference Derwing, Munro, Thomson and Rossiter2009; Towell, Hawkins, & Bazergui, Reference Towell, Hawkins and Bazergui1996). To explain this observation, we should consider L2-specific speech production models.

While Levelt’s model was developed to explain monolingual speech production, more recent models have expanded this framework to cover speech production in bilingual or L2 speakers (De Bot, Reference De Bot1992, Segalowitz, Reference Segalowitz2010). De Bot’s and Segalowitz’s models both assume that the same basic psycholinguistic mechanisms underlie L1 and L2 speech production. According to De Bot, the first process in bilingual speech production, the macroplanning stage of conceptualizing, is language general, meaning it works the same way regardless of the language the utterance will ultimately be produced in. He posits that the subsequent microplanning stage is language specific, however, as different conceptual features need to be specified depending on which language is to be spoken. Segalowitz’s (Reference Segalowitz2010) model is designed to show how L2 speech is vulnerable to disfluencies at many points in the speech production process because of the additional processing load imposed by devoting attention and effort to processes that occur more automatically in L1. Following De Bot’s reasoning about macroplanning being language general, Segalowitz does not predict that macroplanning demands should pose any additional L2-specific processing difficulties. In contrast, the later stages of microplanning, formulating, and articulating are predicted to lead to L2-specific disfluencies because of deficits in L2 linguistic knowledge and less automatized processing.

HOW CONCEPTUALIZING DIFFICULTY INFLUENCES FLUENCY

Given that speaking in an L2 increases cognitive processing demands in the later stages of speech production, but not necessarily in macroplanning, it remains an open question how a processing slowdown in that initial stage would impact fluency in L2 relative to in L1. In the L1 speech production literature, there has been some debate about whether the same attentional resources are drawn on by both macroplanning and microplanning (Greene & Capella, Reference Greene and Cappella1986; Levelt, Reference Levelt1989; Roberts & Kirsner, Reference Roberts and Kirsner2000). If that were the case, an increase in macroplanning activity should also slow down microplanning and ultimately decrease fluency as the conceptualizer produces less material for the formulator in a given period of time. Studies examining temporal cycles of alternating fluency and hesitancy in monologues have provided some evidence that the conceptualizer’s generation of new speech plans requires significant attentional resources. Greene and Capella (Reference Greene and Cappella1986) theorized that in spontaneous speech, transitioning between subgoals or “moves” in discourse planning would place increased demands on central processing capacity. Therefore, they predicted that there would be more pausing at boundaries between ideas in the discourse, during which speakers would be engaged in planning their next move. A time series analysis revealed that most idea boundaries were associated with an increase in silent pausing. When speakers were given guidelines beforehand to structure their discourse, the tendency for idea boundaries to be associated with silent pausing was greatly reduced. That is, when conceptual planning demands were reduced, the disfluency at transitions between ideas was attenuated.

More recently, Roberts and Kirsner (Reference Roberts and Kirsner2000) analyzed spontaneous speech samples and statistically verified the existence of temporal cycles of fluency. They found a strong and consistent tendency for topic shifts to be followed by greater fluency but preceded by more disfluency, using measures that combined silent and filled pauses. They interpreted their findings as supporting models in which macroplanning competes with other levels of speech production for a common pool of limited cognitive resources. In this sense, macroplanning could be a “cognitive bottleneck” that ties up cognitive processing resources and causes other levels of production to run less efficiently until it has finished.

Periods of relative disfluency coinciding with topic shifts in spontaneous speech may reflect the cognitive processing load involved in conceptualizing. However, experiments that actively manipulate the conceptualizing difficulty of speech can show its effects on fluency directly. Early psycholinguistic studies investigated how fluency was affected by how many possible alternative responses could be made to a given stimulus. Siegman and Pope (Reference Siegman and Pope1966) found that when people orally described cards printed with ambiguous scenes, the pictures with more possible interpretations elicited speech with a higher proportion of filled pauses and repairs. Goldman-Eisler (Reference Goldman-Eisler1968) compared the simple task of describing comic strips with the more conceptually complex task of interpreting the same comic strips’ meaning, and she found that the proportion of silent pausing to total speech time was nearly twice as large when interpreting the comics as when merely describing them. Lay and Paivio (Reference Lay and Paivio1969) also compared fluency across multiple speaking tasks of differing cognitive difficulty and demonstrated that various types of disfluencies increased with increasing task difficulty. Although these studies all support the notion that conceptualizing difficulty increases certain kinds of speech disfluencies, it is hard to discern whether the reported fluency differences across experimental conditions were exclusively due to conceptualizing demands. Because the various speaking tasks may have differed from each other in factors such as lexical difficulty, syntactic complexity, and sentence length, the difficulty of the formulating and articulating stages of speech production also likely varied between conditions.

More recently, researchers interested in how conceptualizing is linked to disfluencies have designed experiments with more controlled manipulations where the content of elicited speech is more comparable across conditions. For instance, Christenfeld (Reference Christenfeld1994) tested the theory that the number of options a speaker is contemplating when deciding what to say contributes to the production of filled pauses. His participants had to describe the correct path through three different mazes: one with a single path from start to finish, one with choice points between two possible paths, and one with choice points among three possible paths. As predicted, the number of filled pauses per minute of speech increased as the maze complexity increased, and the number of filled pauses produced at choice points also increased when there were more path options. This experiment likely elicited speech that was relatively similar in vocabulary and structure across the three experimental conditions. However, as the analysis of disfluencies was limited to filled pauses, the effect of increased options for conceptualizing on other types of disfluencies remains an open question.

Another recent study that explored conceptual and planning-based factors related to fluency in speech production is that of Schnadt and Corley (Reference Schnadt and Corley2006), who employed network description tasks. In their experiments, participants viewed networks of interconnected objects on a computer screen. Their task was to describe the route taken by a marker that moved along the network of paths connecting the objects. Each pair of adjacent objects was connected by one, two, or three lines, so participants sometimes had to specify which of the multiple possible paths the marker took. It turned out that when there were more paths to choose from, people produced more filled pauses, prolongations, and repairs. One potential confounding factor in this experiment is that whenever there were multiple path options, the description also required a greater number of words (i.e., to specify whether to take the left path, right path, or middle path). Therefore, the increase in disfluencies with more path options could still have been partly due to the processing demands of formulating to produce more linguistic output, in addition to the heavier conceptualizing load.

Of existing studies on conceptualizing and speech production, the one that best controls linguistic output is that of Melinger and Kita (Reference Melinger and Kita2007), who looked at the link between conceptualization processing load and the gesture production rate. Their participants described deterministic or nondeterministic networks of colored circles. Partway through the description of a given network, they were interrupted by one of two secondary tasks: either a spatial task that generated interference in spatial working memory or a task that used different cognitive resources. The former task was assumed to make subsequent macroplanning more difficult. The experiment was designed so that the content of the speech required after the secondary task was the same in both conditions. As predicted, subjects produced more gestures upon resuming their description of the network after the spatial task than the nonspatial task. Though Melinger and Kita’s (Reference Melinger and Kita2007) study was focused on the production of gestures, rather than speech disfluencies, their experimental design illustrates an effective method of varying conceptualizing demands while holding speech output constant.

RESEARCH QUESTIONS AND HYPOTHESES

On the whole, the research discussed above suggests that the increased processing load imposed by greater conceptualizing difficulty is likely to have a negative effect on fluency, at least in L1 speech production. However, there has been inconsistency in the degree to which the process of conceptualizing has been successfully isolated from later speech production stages, which makes it hard to draw clear conclusions about its unique impact on fluency. The present study aims to experimentally manipulate macroplanning difficulty in a controlled way, to examine a wide range of utterance fluency measures separately, and to clarify the link between conceptualizing and fluency in both L1 and L2 speech production. Two main research questions are addressed. First, what is the effect of macroplanning difficulty on utterance fluency in spontaneous speech production? Second, does increased conceptualizing difficulty cause the same or different patterns of disfluencies in L1 and L2 speech? In other words, is there an interaction between conceptualizing difficulty and language such that an increase in macroplanning demands will have a larger effect on disfluencies in L2 than in L1?

Regarding the first question, we hypothesize that when macroplanning is made more difficult, speech will become less fluent. We expect increased conceptual planning demands to induce more filled pauses, a result that has been previously reported in studies using experiments with different levels of conceptual or cognitive difficulty (e.g., Christenfeld, Reference Christenfeld1994; Lay & Paivio, Reference Lay and Paivio1969; Schnadt & Corley, Reference Schnadt and Corley2006; Siegman & Pope, Reference Siegman and Pope1966). This finding would support the view that filled pauses are the type of disfluency most closely linked to the process of generating message-level plans (Fraundorf & Watson, Reference Fraundorf and Watson2013). We also predict that increased macroplanning difficulty will cause more silent pauses, in line with the results of previous studies comparing speech tasks of varying difficulty (e.g., Goldman-Eisler, Reference Goldman-Eisler1968) and studies analyzing silent pauses surrounding idea boundaries in spontaneous speech (e.g., Greene & Capella, Reference Greene and Cappella1986). If conceptualizing difficulty affects not only breakdown fluency but also repair fluency, then we would also expect that higher macroplanning difficulty would increase the occurrence of repetitions and repairs, in line with previous studies (e.g., Lay & Paivio, Reference Lay and Paivio1969; Schnadt & Corley, Reference Schnadt and Corley2006). Finally, we expect that greater macroplanning difficulty will lead to more lengthenings of syllables, such as “the” pronounced like “thee,” as these types of prolongations have also been associated with planning problems in speech production (e.g., Fox Tree & Clark, Reference Fox Tree and Clark1997). Of course, we cannot directly pinpoint the cause of any one disfluency, but our aim is to determine which of the abovementioned types of disfluency are influenced by changes in conceptualizing difficulty when the formulating and articulating processes are held constant by constraining the linguistic output.

With regard to the second research question, we predict that when conceptualizing difficulty is increased, this will have a negative effect on fluency in L1 and L2, in terms of both types of disfluencies and how many more disfluencies will be produced. Overall, we predict that L2 speech will be less fluent than L1 speech, which can be explained by any number of L2-specific difficulties in formulating and articulating (Segalowitz, Reference Segalowitz2010). Moreover, just as psycholinguistic research has shown that conceptualizing difficulty may be linked to various kinds of disfluencies in L1 speech production, it has been shown in the L2 acquisition literature that highly demanding speaking tasks with a greater level of cognitive complexity result in less fluent L2 speech (e.g., Ellis, Reference Ellis2009; Levkina & Gilabert, Reference Levkina and Gilabert2012; Robinson, Reference Robinson2001; Skehan & Foster, Reference Skehan and Foster1997). However, it is not entirely clear whether we should expect to find the same pattern and magnitude of conceptualizing-related disfluencies in both language conditions. On the one hand, we might predict that the patterns of disfluency will be the same in L1 and L2 as macroplanning, unlike some later stages in speech production, is theorized to be a language-independent process (De Bot, Reference De Bot1992). On the other hand, based on Robert and Kirsner’s (Reference Roberts and Kirsner2000) cognitive bottleneck account, we would expect to find at least some interaction effects between language and conceptual difficulty. This is because an increased macroplanning load would temporarily tie up resources needed by the formulator and articulator, and as these operate less efficiently in L2, it would be even harder in L2 for the whole speech production system to catch up again, leading to disproportionately more disfluencies as a result.

OVERVIEW OF THE PRESENT STUDY

The present study comprises two experiments that systematically manipulated the difficulty of macroplanning in both L1 and L2 speech in order to determine the effect of this manipulation on a range of disfluency types. Both experiments were network description tasks, similar to those used by Schnadt and Corley (Reference Schnadt and Corley2006). Like Christenfeld (Reference Christenfeld1994), we operationalized macroplanning difficulty as the number of choices or alternative paths that participants had to consider at each node in the network. The experiments were designed such that the required speech output was identical regardless of the level of macroplanning difficulty. This way, the processes of formulating and articulating were constant across conditions, and comparing the fluency of speech across conditions could clarify which disfluency patterns were specifically related to conceptualizing difficulties.

Inspired by previous studies using online changes in visual stimuli to interrupt speech planning (e.g., Hartsuiker, Catchpole, De Jong, & Pickering Reference Hartsuiker, Catchpole, De Jong and Pickering2008), our experiments implemented online changes in the networks in order to make participants plan their speech anew at certain steps along the path. We used eye-tracking technology to track participants’ gaze while they were speaking. The online changes were triggered when their eyes fixated on certain objects at predetermined points in the network. This procedure was based on the assumption that people’s gaze follows the objects they are speaking about and that gaze duration is related to the time it takes for speakers to retrieve the phonological form of an object’s name (Griffin & Bock, Reference Griffin and Bock2000; Van der Meulen, Reference Van der Meulen2001).

EXPERIMENT 1: APPEARING PATHS

In this experiment, participants had to describe paths in networks of pictures in which the target paths between the pictures only appeared onscreen one step at a time. This meant that participants had to continuously generate new speech plans. Macroplanning difficulty was operationalized as the number of distractor paths at each choice point in the network. Steps could appear in one of two conditions: easy when there was one target path and one distractor path and difficult when there was one target path and two or three distractors. The target path was always the same across both conditions, so the content of speech was identical regardless of the level of macroplanning difficulty.

Method

Participants

The participants were 25 students (18 female, 7 male) with a mean age of 22 years, who were recruited and tested at Utrecht University in the Netherlands. All were L1 speakers of Dutch with an intermediate to advanced level of L2 English proficiency. All participants had received at least 6 years of formal English training in high school, but none had ever enrolled as (BA or MA) students of English language and culture. Participants filled out the LexTALE task for English (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) and performed on average 72 points (SD=18), equivalent to around the B2 level of English proficiency.Footnote ¹

Materials.

PICTURE STIMULI

This experiment used 54 pictures taken from the International Picture Naming Project (Bates et al., Reference Bates, D’Amico, Jacobsen, Szekely, Andonova, Devescovi and Tzeng2003; Severens, van Lommel, Ratinckx, & Hartsuiker, Reference Severens, van Lommel, Ratinckx and Hartsuiker2005), which has norms for these pictures in Dutch and English. Pictures were chosen such that name agreement was 96% or higher in Dutch (Severens et al., Reference Severens, van Lommel, Ratinckx and Hartsuiker2005) and English (Bates et al., Reference Bates, D’Amico, Jacobsen, Szekely, Andonova, Devescovi and Tzeng2003). Distractor pictures for individual networks were chosen randomly. The picture stimuli fit into different semantic categories such as animals, food, human-made objects, and leisure.

NETWORK STIMULI

Each network consisted of 16 picture slots on a 4×4 grid. The slots were connected by colored lines representing the paths in the network. Target paths were semantically related and consisted of six items. Each step along the path except the first and last counted as a single trial, so each network consisted of four trials, each of which could be easy or difficult. The first item on the path was marked with an “A” and the last item with a “B.” Figure 1 shows an example of two consecutive steps in an Appearing Paths network.

Figure 1 Two consecutive steps in one network in the Appearing Paths experiment. The green dot, not visible to the participant, indicates the eye fixation location. When the hammer is fixated, the key from the previous step fades out (left frame), and when the match is fixated, the hammer from the previous step fades out (right frame).

There were a total of 20 networks, each paired with a so-called mirror version in which each step on the network appeared in the opposite difficulty condition as in the original network. Thus, items that appeared in the easy condition in the first network would appear in the difficult condition in the mirror network, and vice versa. Figure 2 displays an example of the same item as it was presented in the easy and difficult conditions. The target paths were identical between the two versions, so the content of the required speech was the same. Two experimental lists were used, one containing the original networks and one containing the mirror networks. Participants received one list for their L1 and the other list for their L2.

Figure 2 A comparison of a single step in an Appearing Paths network—through the green line to the iron—presented in one list in the easy condition and in the other list (left frame) and in the difficult condition (right frame). In both cases, the previous step from the flag is faded out and no longer an option. The green dot, not visible to the participant, indicates the eye fixation location.

Procedures

The experiment was conducted in L1 and L2 in two separate sessions held approximately 1 week apart. The procedure was identical for both sessions. Participants were familiarized with the set of 54 pictures in a self-paced picture-naming task. If participants did not know the name for an object, the name was provided to them. After familiarization, the network description task began. Participants were instructed that for each network, their objective was to describe a path between items that were semantically related to each other. For example, if the first item that appeared in a network was a turtle, the correct path might eventually include an owl, a zebra, a lion, a giraffe, and a pig, with human-made objects on distractor paths. Participants were told that their description should always include the name of the picture to which they were moving and the line color of the path they were taking. They received one example network before beginning the main test phase.

For each individual network, the procedure was as follows. Each time the participant fixated on a target item, the item was highlighted in blue and new paths pictures branching out from the item appeared. At the same time, the previous target item and path faded to a light gray, while previous distractor paths and items disappeared entirely. A gaze duration of 500 ms on the target item was used to trigger these changes, because this duration had been shown to work best in pilot testing. If the participants chose the wrong path and fixated on a distractor object, the lack of any visual changes alerted them to their error. Throughout the task, participants’ speech was recorded.

Measures

As the experimental manipulation was the difficulty in describing the path from one item to the next, the speech recordings were divided into segments that each represented the description of a single step in the path (e.g., “from the red line to the turtle”). Each of the 20 networks contained three speech segments to be analyzed: the descriptions of the paths from the second to the third item, from the third to the fourth item, and from the fourth to the fifth item. The description of the very first path step was excluded from analysis because at that stage participants were still figuring out the semantic theme of the correct path for the first time. The last step was also excluded from analysis because participants did not have to consider any distractor paths there, given that the end target picture was always labeled as such. For each speech segment, we measured fluency in two ways: counting the presence of overt disfluencies and taking measures of speaking time. While taking these measurements, the annotator was blind to the experimental condition in which the speech segment was produced.

DISCRETE DISFLUENCIES

The following discrete disfluencies were annotated: filled pauses, silent pauses, lengthenings, repetitions, and repairs. Filled pauses were defined as instances of filler words indicating hesitation, such as “uh” and “um.” Silent pauses were defined as pauses lasting longer than 150 ms. This is a shorter criterion than sometimes used in the L2 speech production literature (e.g., 200 ms defined by Kormos, Reference Kormos2006, or 250 ms as advised by De Jong & Bosker, Reference De Jong and Bosker2013). This shorter criterion was chosen because pauses were always counted within (rather than between) the already short path-step utterances, and because we wanted to use the same threshold for L1 and L2 pauses. Lengthenings were defined as instances of syllables that the annotator judged to have a noticeably drawn-out duration relative to the speaker’s typical pronunciation (e.g., “the” pronounced as “theee” or “thuuhh”). Repetitions were instances when a word or phoneme was quickly repeated without its identity being modified (e.g., “the b-blue line”). Repairs were instances when the speaker made an immediate self-correction, whether to correct a mispronounced word (e.g., “zèbra— zebra”) or to correct a wrong word (e.g., “the rrr-yellow line”).

SPEAKING TIME MEASURES

Two measurements related to speaking time were calculated only for the subset of trials without any overt disfluencies, which allowed for speaking time to be assessed independently. First, the length of each utterance as measured from the onset of the first syllable to the coda of the last syllable was taken as a measure of total speech duration for each segment. Second, for each utterance, the length of time from the moment of fixation on the critical object (as measured by the eye tracker) to the moment when the speaker began to pronounce the color of the path to that object was recorded. This measurement reflects the time it took for the speaker to commit to a choice about which path to follow at that step, because the colors of the different lines the participants had to choose from were always different. This duration includes any silent time before the speaker began describing that step of the path, and it may include speech from a previous utterance that was still unfolding when the fixation was measured. The “time from fixation to color name” measure is thus informative on top of the “total speech duration” measure because it more closely encapsulates the timeframe during which conceptualizing for that trial must have been occurring, including any conceptual planning time before the utterance. As this second measure relied on fixation data from the eye tracker, it was only calculated for trials where people’s eye movements were closely aligned to the items they were talking about. Thus, for this measure, we excluded trials where participants’ gaze was actually a step ahead of the item they were currently describing.

Results

Across both the L1 and L2 conditions for all networks and all participants, we began with 3,000 critical speech segments to analyze. Based on a visual inspection of the histogram of segment durations, we decided to exclude all trials in which the total speaking time to describe a path step was longer than 5 s, as this point reflected the beginning of the flat right-sided tail. Utterances longer than this cutoff point typically indicated substantial confusion or distraction on the part of the participant, and we only wanted to analyze trials with the expected speech output. In addition, we excluded all trials in which the speaker erred by taking the wrong path or when technical problems with the eye tracker disrupted the experiment temporarily. This resulted in 2,871 usable trials (95.7% of the total trials).

Presence of disfluencies

For the Appearing Paths experiment, Table 1 shows the proportion of utterances in each condition that contained at least one occurrence of the given types of disfluencies.

Table 1 Appearing Paths: Percentage of utterances containing disfluencies across conditions

To determine if the differences between the two conditions were significant, we constructed generalized linear mixed-effects models for each disfluency type (Baayen, Davidson, & Bates Reference Baayen, Davidson and Bates2008; Quené & van den Bergh, Reference Quené and van den Bergh2008). In each model, the presence (vs. absence) of the disfluency was the dependent variable. The models’ fixed effects included the condition of each trial (easy or difficult choice), the language (L1 or L2), and the interaction between condition and language, and the random effects included participant and item number. When random slopes were added to the models, there were no significant improvements in model fit nor changes in the interpretation of results, so here we report the models with nonrandom slopes. These models therefore assume that the effect of conceptualizing difficulty on fluency does not differ across participants. The results of these models, with the easy choice condition taken as the intercept, are shown in Table 2. There were no main effects of conceptualizing difficulty on any of the disfluencies, nor were there any significant interaction effects between conceptualizing difficulty and language. However, there was one main effect of language: lengthenings occurred more often in L2 than in L1.

Table 2 Appearing Paths: Generalized linear mixed-effects models for predicting different disfluency types

Note: Intercept represents the easy choice and L1 condition. *p<.001.

Speaking time measures

In addition to examining measures of breakdown and repair fluency, we examined speech fluency by comparing the speech time variables across the two levels of macroplanning difficulty and the two language conditions. Both speech time variables were only calculated for the subset of trials without disfluencies in order to examine speech time independently. However, the mean time from fixation to color name almost always spanned the last part of the preceding utterance and the first part of the current (target) utterance, and therefore it always included whatever silent pause came between the two utterances. Note that the latter measure was only calculated for fluent trials in which the participants’ speech kept pace with their eye movements, as discussed in Measures above. These descriptive statistics are presented in Table 3.

Table 3 Appearing Paths: Speaking time measures calculated for fluent trials

Next, as shown in Table 4, we used linear mixed-effects models to explain the speech time variables by setting the macroplanning difficulty, language conditions, and their interaction as the fixed effects and participant and item number as random effects. The easy choice condition was again treated as the baseline (intercept). The p values for each predictor in the models were calculated from the t statistics according to the conservative method described in Hox (Reference Hox2010, p. 46), which calculates the degrees of freedom as the number of second-level units (here 25 participants) minus the number of explanatory variables in the model (here six, counting the two random effects, three fixed effects, and intercept) minus one. Based on these models, the total utterance duration did not differ significantly as a function of the conceptual difficulty or whether the speech was in L1 or L2. However, the time from fixation to color naming was longer in the more difficult choice condition. For the total speaking time and for the time from fixation to color naming, there were no interaction effects between language and conceptualizing difficulty.

Table 4 Appearing Paths: Linear mixed-effects models for predicting speaking time measures

Note: Intercept represents the easy choice and L1 condition. *p<.01.

Discussion

The Appearing Paths experiment showed that an increase in macroplanning difficulty slowed down speech during the timeframe when conceptualizing was taking place, as reflected in the time from fixation to color name measure. However, conceptual difficulty did not lead to a significant increase in the five disfluency types we measured, despite some numerical trends in that direction. In other words, people did take slightly more time to speak while the conceptualizing demands were higher, but they managed to avoid interrupting the flow of their speech to do so. This could be because the difference in difficulty between the easy and difficult conditions (one distractor path vs. two or three distractor paths) did not increase conceptualizing demands enough for their effect on disfluencies to be shown. Here there was a practical limit to the number of paths we could require participants to choose from, whereas in everyday life speakers are faced with a far greater range of choices every time they generate a speech plan. In the next experiment, we compared three different levels of macroplanning difficulty in a more cognitively demanding task.

EXPERIMENT 2: CHANGING PATHS

In this experiment, participants were required to find and describe the shortest path between two pictures in a series of networks. During their path description, the network of paths would sometimes change at a predetermined point, forcing participants to revise their original speech plan. Macroplanning difficulty was operationalized as the number of distractor paths at the critical nodes where the network changed. The easy change condition was when there was one target path and one distractor path after the change, and the difficult change condition was when there was one target path and two or three distractors after the change. In the no-change condition, which served as a baseline, the network did not change during the participant’s path description. Because the correct path after the critical node was identical across the three conditions, the content of speech following the change was identical across conditions.