Skip to main content Accessibility help
Hostname: page-component-99c86f546-qdp55 Total loading time: 1.057 Render date: 2021-12-04T16:38:48.582Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

2 - Theory Visualizations for Bilingual Models of Lexical Ambiguity Resolution

from Part I - Theoretical and Methodological Considerations

Published online by Cambridge University Press:  24 December 2019

Roberto R. Heredia
Texas A & M University
Anna B. Cieślicka
Texas A & M University


Connectionist models describe the human language system as a high-dimensional state space composed of a neural network in which layers of processing units for different aspects of a linguistic signal (visual or acoustic features, orthography, semantics, etc.) interact with one another. Lexical ambiguity emerges when there is conflicting input within or between processing layers. Bilingual connectionist models, such as the bilingual interactive activation (BIA) and bilingual interactive activation plus (BIA+) models, treat bilingualism as the inclusion of new dimensions into this network, resulting in new opportunities for conflict, such as interlingual homonyms and cognates. We outline connectionist accounts of lexical ambiguity resolution in monolinguals and bilinguals, which we visually depict as movement through a multidimensional state space.

Publisher: Cambridge University Press
Print publication year: 2020


We often think of bilingualism as “adding” a new and different language system to one’s existing language system. There might be a tiny bit of truth to this metaphor when learning a second language (L2) late in life. However, it surely must be a poor metaphor for bilinguals who learn their two languages relatively early in life. Rather than modeling bilingualism as involving the “adding” of a second processor for the L2, perhaps bilingualism itself can be treated as just another set of dimensions in the massive state space in which a speaker’s linguistic representations are organized, along with dimensions of situational context, text genre, grammatical gender, linguistic register, syntax, phonology, semantics, and so on (e.g., Onnis & Spivey, Reference Onnis and Spivey2012). When these various aspects of language are treated not as submodules within the language module but instead as dimensions in a single state space, then suddenly new insights can be gained in understanding how language is processed in general and how bilinguals process lexical ambiguity in particular. In fact, the very concept of a lexical representation changes dramatically when one switches from a computer (or dictionary) metaphor of the lexicon to a dynamical system account of word knowledge (Elman, Reference Elman2004).

In this chapter, we review some connectionist models of bilingualism and discuss how they might deal with lexical ambiguity; but, first, we examine what lexical ambiguity itself “looks like” in the state space of a language processing system. By treating the representational parameters of a model as dimensions in a state space, the range of behaviors (and regions visited) in that volumetric space can be identified more systematically. In a state space that combines a variety of linguistic aspects, a word representation can be seen as extending not only across a semantic field (e.g., Lehrer, Reference Lehrer1974) but indeed across a lexical field that combine the semantics, phonology, and situational context of how the word is typically used (e.g., Elman, Reference Elman2009; see also Lyons, Reference Lyons1963). By studying the real-time temporal dynamics of lexical ambiguity resolution in bilinguals (e.g., Altarriba & Gianico, Reference Altarriba and Gianico2003), it may be possible to better see inside the structure of this state space in which language is represented.

Traditional approaches to understanding lexical ambiguity resolution relied heavily on the computer metaphor of the mind, positing a modular processor for lexical access followed by a subsequent processor for context effects (Swinney, Reference Swinney1979; Tanenhaus, Leiman, & Seidenberg, Reference Tanenhaus, Leiman and Seidenberg1979). Experiencing the uncertainty of reading or hearing a word like bug – which could mean insect or spy device – was likened to activating two separate dictionary entries, one of which would soon have to be deactivated by the context processor for comprehension to be successful. Rather than relying on this box-and-arrow computer metaphor, one can instead treat the word bug as having one simple circumscribed region in the phonological dimensions (since it is a homophone) but projecting onto two disparate regions in the semantic dimensions of the massive state space of language (since it has two rather different meanings). When those phonological and semantic dimensions are combined to form one phono-semantic state space, the region dedicated to the word bug is seen as a single bounded, but very nonconvex, shape. In fact, when only certain dimensions are shown and compressed just right, the lexical field for bug would look roughly shaped like a letter V, as in Figure 2.1(a). It is exactly this nonconvexity of the shape that allows us an insight into what lexical representations might “look like” in bilinguals. In this dynamical system account, when hearing or reading the word bug, the human mind visits portions of this bounded shape, and the other contextual dimensions (some semantic, discourse, and situational dimensions not depicted here) help push the state of the system toward one or the other arm of that V-shape, to gradually achieve a contextually appropriate understanding of the word.

(a) lexical ambiguity involves a highly nonconvex shape that covers unrelated regions of semantic space;

(b) polysemy involves a relatively more convex shape that includes interstitial regions of semantic space; and

(c) temporary phonological ambiguity, as with cohorts, often involves a highly nonconvex shape again, one that heavily depends on temporal dynamics

Figure 2.1 Theory visualizations of lexical fields in linguistic state space:

However, not all ambiguous words have meanings that are unrelated to one another, like bug. Take for example the verb dusted, in Sentence (2.1) below.
  1. (2.1) The chef dusted the cake with powdered sugar, but then the maid dusted it clean.

The verb dusted is typically referred to as polysemous, rather than ambiguous, because its different meanings/usages are at least somewhat semantically related to one another (Gibbs & Matlock, Reference Gibbs, Matlock, Cuyckens and Zawada2001). Yet instead of treating ambiguous words and polysemous words as if they were categorically different phenomena, a dynamical-systems state space description allows one to visualize the graded similarity between the two phenomena. Figure 2.1(b) shows how the lexical field for dusted would be somewhat similar to that for bug but notably different in that the semantic regions used for its different meanings are spatially contiguous with one another, allowing for blends across that semantic spectrum.

In addition to ambiguous words and polysemous words, another form of lexical ambiguity arises temporarily during the first couple of hundred milliseconds of hearing a spoken word. For example, halfway through hearing the word candle, a listener will briefly exhibit partial activation of a similar-sounding cohort word like candy (e.g., Marslen-Wilson, Reference Marslen-Wilson1987; McClelland & Elman, Reference McClelland and Elman1986) and will even look at a picture of a candy before finally looking at the target object, a candle (Allopenna, Magnuson, & Tanenhaus, Reference Allopenna, Magnuson and Tanenhaus1998; Spivey-Knowlton, Reference Spivey-Knowlton1996). For example, if one reads Sentence (2.2) out loud, a listener may find that the context leading up to the first syllable in the final word steers one somewhat in the direction of expecting it to turn out to be the word stupid instead of stupendous; and Figure 2.1(c) provides a rough sketch of what that temporally dynamic lexical field might look like in linguistic state space.
  1. (2.2) In his ridiculous costumes, Sacha Baron Cohen looks just totally stupendous.

We will revisit those temporally dynamic lexical fields later in our discussion, after we have reviewed some of the literature on how connectionist models of bilingualism might address lexical ambiguity and the literature on how bilinguals actually process spoken words. For now, we return to temporally static treatments of linguistic state space.

When one considers the wide range of idiosyncratic linguistic experiences that each language user undergoes, it seems clear that the topology of any one person’s linguistic state space will be at least subtly different from everyone else’s. Individual differences account for a substantial amount of the variance in language learning and processing in both monolinguals and bilinguals (e.g., Dörnyei, Reference Dörnyei2005; Grosjean, Reference Grosjean1994). For example, lexical ambiguity resolution has been shown to function rather differently for people with high vs. low memory spans (Miyake, Just, & Carpenter, Reference Miyake, Just and Carpenter1994). (However, some of the variance attributed to memory span might instead be explained by degree of language experience; MacDonald & Christiansen, Reference MacDonald and Christiansen2002.) People with high memory spans are able to understand the correct meaning of boxer in Sentence (2.3) more readily than people with low memory spans.
  1. (2.3) Since Ken really liked the boxer, he took a bus to the pet store to buy the animal.

Someone with a low memory span (or limited language experience with English) might have a lexical field for the word boxer that, functionally speaking, spans across only a narrow relatively convex range of semantic space (Figure 2.2(a)). Thus, when reading the word boxer, that person might automatically settle into the pugilist meaning of the word and then encounter some difficulty understanding the rest of Sentence (2.3). By contrast, someone with a high memory span (or extensive language experience with English) might have a lexical field for boxer that stretches out into a variety of regions of linguistic state space (Figure 2.2b). Therefore, when reading the word boxer, that person might not settle too deeply into any one tendril of that lexical field; and when the rest of the sentence finally provides the disambiguating context, they are ready and able to transition into the contextually appropriate region of semantic space.

(a) a person with low memory span or limited English experience would have a functionally narrow lexical field for the word boxer, whereas

(b) a person with high memory span or extensive English experience would have a more tentacular lexical field for boxer, with tendrils that stretch into a variety of semantic spaces

Figure 2.2 Individual differences in lexical fields:

Similar to memory span and language experience, contextual diversity will also introduce substantial individual differences in the topology of this linguistic state space. Contextual diversity measures the frequency with which a word occurs in significantly different contexts. Take, for example, the words posterior and piglet. They both have the same overall lexical frequency: 240 occurrences each in a 560 million word corpus. Therefore, traditional approaches in psycholinguistics would predict that these two words should exhibit equal latency in reading and reaction time tasks (e.g., Forster & Chambers, Reference Forster and Chambers1973). However, about 60 percent of the occurrences of posterior take place in academic texts, while only 10–15 percent of its occurrences are in fiction, magazine, and newspaper contexts each – and it almost never shows up in spoken contexts. Therefore, contextually speaking, posterior is a relatively nondiverse word. In our linguistic state space framework, this would mean that its lexical field is relatively simple and convex (Figure 2.3a). As a result, if the language system started out in a random or neutral location in state space, and was forced to traverse its way to the region for posterior, it might have a long distance to travel, thus producing a somewhat long response time. By contrast, the word piglet has a much more evenly distributed pattern of occurrences across these different contexts. Only 40 percent of its occurrences take place in fiction, about 20 percent each in magazines and newspapers, and 10 percent each in academic and spoken contexts. Therefore, if the language system started out in a random or neutral location in state space, and was forced to travel to the piglet region, it would likely have a relatively short distance to travel, and thus produce a short response time – even though it has the same overall lexical frequency as posterior.

(a) one region of linguistic-genre space in which the lexical field for posterior shows itself to be nondiverse and rather convex;

(b) another region of the same space in which the lexical field for piglet stretches itself nonconvexly into diverse contexts

Figure 2.3 Contextual diversity of lexical fields:

That data pattern is exactly what Adelman, Brown, and Quesada (Reference Adelman, Brown and Quesada2006) found when they reanalyzed the data from six word identification experiments. Contextual diversity predicted fast and slow response times more robustly than did lexical frequency. Results like this have been replicated and extended to word learning (Hills et al., Reference Hills, Maouene, Riordan and Smith2010; Johns, Dye, & Jones, Reference Johns, Dye and Jones2016) and to eye movement measures of whole sentence reading (Chen et al., Reference Chen, Huang, Bai, Xu, Yang and Tanenhaus2017; Plummer, Perea, & Rayner, Reference Plummer, Perea and Rayner2014). Evidently, after decades of assuming that word frequency was a bedrock foundation for psycholinguistics, it appears that the language system does not actually care how many times a lexical representation has been instantiated; it cares how far in state space it has to travel right now in order to reach that lexical representation.

Given these complex transformations of state space that are generated by individual differences in working memory, or language experience, or contextual diversity, just imagine the transformations that must take place as a result of being bilingual. Rather than assuming that bilinguals process lexical ambiguity in some categorically different way than monolinguals do, perhaps this graded range of idiosyncratic state space topologies (in Figures 2.12.3) allows one to consider a bilingual’s linguistic state space as just another variety of these kinds of individual differences – but an especially interesting one, to be sure. In this framework, almost every word that a bilingual hears will have a few extra tendrils in its lexical field, compared to a monolingual, that provide potential branchings-off into different regions of linguistic state space. Now that we are equipped with theory visualizations for the kinds of shapes that lexical ambiguity can take in the state space of the language system, we turn to discussing connectionist models of lexical ambiguity resolution and of bilingualism.

Parallel Distributed Processing Models of Word Recognition

The bilingual interactive activation (BIA) and BIA+ models (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002; van Heuven, Dijkstra, & Grainger, Reference van Heuven, Dijkstra and Grainger1998) are extensions of an earlier parallel distributed processing (PDP) model – the interactive activation model (IAM) (McClelland & Rumelhart, Reference McClelland and Rumelhart1981). Therefore, to better understand how the BIA models function, it is worthwhile to first discuss the IAM and related PDP models more generally. To that end, this section describes the structure and mechanisms of PDP models in the simplest case of unambiguous word recognition by monolinguals. Then, the following section describes how PDP models account for lexical ambiguity resolution. We then go on to describe how the BIA models account for bilingual-specific phenomena involving homographs, homonyms, cognates, and interlingual cohorts.

The IAM (McClelland & Rumelhart, Reference McClelland and Rumelhart1981; Rumelhart & McClelland, Reference Rumelhart and McClelland1982) is a multilevel connectionist architecture originally designed to simulate the word superiority effect, a classic perceptual phenomenon whereby identification of visually presented letters is faster when the letters are inside words rather than nonwords (McClelland & Johnston, Reference McClelland and Johnston1977). The logic underlying the IAM is that recognition of letters begins first with recognition of basic visual features, followed by activation of letters containing those features, which in turn activates words containing those letters. The IAM simulates the word superiority effect by allowing feedback connections from words to letters, such that recognition of letters is facilitated when words become active.

Structurally, the IAM includes three layers of interconnected nodes: a feature layer, a letter layer, and a word layer (see Figure 2.4a). The solid lines with arrows indicate connections, while the dashed lines with circles represent inhibitory connections. Current activation of each node is represented by the thickness of the border around the node. The feature layer contains nodes that become active in the presence of specific, simple visual features, analogous to orientation selective cells in the visual system. The letter and word layers contain nodes corresponding to all of the known letters and words, respectively. In addition, letter position within a string may be encoded as well, such that there is a node for each letter at each possible position. Individual feature detectors have excitatory connections with every letter in which they are found. Individual letters, in turn, have excitatory connections with every word in which they are found. Meanwhile, each word has inhibitory connections with all other words; and, crucially, there are both excitatory and inhibitory feedback connections from the word layer to the letter layer, such that word nodes send excitation to any letter they contain and inhibition to any they do not. It is these feedback connections that allow the model to reproduce the word superiority effect.

Figure 2.4 (a) McClelland and Rumelhart’s (Reference McClelland and Rumelhart1981) interactive activation model processing the letter R; (b) Kawamoto’s (Reference Kawamoto1993) PDP model of lexical ambiguity resolution with a sample of all connections shown; (c) an example energy landscape that determines the trajectory of a system as it traverses its state space; and (d) Dijkstra and van Heuven’s (Reference Dijkstra and van Heuven2002) BIA+ model

On presentation of a visual stimulus, the features contained in the string of letters first become active. The feature nodes then pass their activation to nodes representing letters at their specified location in the string. This creates a set of candidate letters that the system is “considering,” consisting of all letters containing features present in the input. The letter nodes then pass activation to words that are consistent with those letters. Competition among words, via their mutual inhibitory connections, results in the most highly consistent word (or words, in the case that there is ambiguity) becoming more active, while all other words are suppressed. The active word nodes then pass activation to the letters they contain. In this way, letters that are presented in the context of a word receive additional activation, relative to letters in nonwords, which is how the model accounts for the word superiority effect.

While the original IAM dealt primarily with visual word recognition, the TRACE model (McClelland & Elman, Reference McClelland and Elman1986; and its reimplementation, jTRACE: Strauss, Harris, & Magnuson, Reference Strauss, Harris and Magnuson2007) extended the same principles to model spoken word recognition. In TRACE, the letter layer of the original IAM is replaced with a phoneme layer, and the feature layer now consists of nodes responding gradiently to various acoustic dimensions rather than visual features. Since speech unfolds over time, the input to TRACE is a sequence of acoustic features, which stands in contrast to the way that the visual IAM is presented with all visual information simultaneously. As a result of this sequential presentation, even unambiguous speech input is temporarily ambiguous at the word level: Onsets are consistent with many possible words and, as more of the input is received, the pool of consistent words is narrowed until finally the offset leaves only a single candidate.

In this way, TRACE captures the predictions of the Cohort model of speech processing (Marslen-Wilson, Reference Marslen-Wilson1987), which held that lexical access occurs as a sequential search by method of elimination. Importantly, lexical access in Cohort is all-or-none in that words that are inconsistent with an onset are eliminated from consideration. As a result, the Cohort model cannot recover in the case of degraded information or mispronunciations. In contrast, TRACE is a continuous mapping model, meaning that activation flows continuously between layers, such that a given word unit can still receive activation, even if some part of the input is inconsistent with it. As a result, TRACE provides a better fit to the behavioral data, which shows, for example, that listeners partially activate rhyme-cohorts that have a different onset (e.g., making eye fixations to a speaker when the spoken input is beaker; Allopenna et al., Reference Allopenna, Magnuson and Tanenhaus1998). It is worth noting, however, that TRACE does not provide a perfect fit to behavioral data: There is also evidence that listeners partially activate anadromes – words with the same phonemes in a different order (e.g., making eye fixations to a sub when the input is bus; Toscano, Anderson, & McMurray, Reference Toscano, Anderson and McMurray2013). TRACE encodes information about temporal ordering by including copies of each phoneme node corresponding to each of the possible positions in an input stream (which can be similarly implemented with letter position in the IAM), but the aforementioned results suggest this might not be perfectly representative of human speech processing. Still, TRACE has stood the test of time as one of the best general models of speech processing and is able to capture a wide range of phenomena related to lexical ambiguity resolution, as we will discuss in more detail in the next section.

These PDP models provide the foundation for the BIA models. The earliest form of the BIA model (van Heuven et al., Reference van Heuven, Dijkstra and Grainger1998) extended the orthographic-only IAM (McClelland & Rumelhart, Reference McClelland and Rumelhart1981) by the addition of two lexicons and the aforementioned language nodes. The later BIA+ model (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002) made the conceptual addition of phonological encoding, similar to that of TRACE (McClelland & Elman, Reference McClelland and Elman1986). The continuous mapping property of these interactive activation models means that even unambiguous stimuli result in temporary uncertainty in the network. As a result, these models extend gracefully to ambiguous inputs, which are dealt with in the same fashion as unambiguous ones. In the next section, we examine the behavior of PDP models in the specific case of lexical ambiguity resolution, and we show that they again provide a robust fit to behavioral data.

Lexical Ambiguity Resolution in PDP Models

Early behavioral evidence in lexical ambiguity resolution showed that both meanings of an ambiguous word appear to become active at least briefly, irrespective of preceding context, suggesting that lexical access occurs first in a context-free stage of processing, followed by a second stage of processing that integrates context (Swinney, Reference Swinney1979; Tanenhaus et al., Reference Tanenhaus, Leiman and Seidenberg1979). Later findings challenged those results. For example, Tabossi (Reference Tabossi1988) demonstrated that in sentential contexts that are sufficiently biasing, the contextually appropriate meaning of a homograph is selectively activated. In the parlance of Figure 2.1(a), one can imagine both a weak context that places the system in a region of the bug lexical field that is roughly equidistant from its two semantic endpoints and a strong context that places the system already deep into one of those semantic endpoints. Vu, Kellas, and Paul (Reference Vu, Kellas and Paul1998) extended Tabossi’s findings by showing that multiple sources of contextual bias can be seen to influence lexical access independently, such that priming of a target word is influenced by a convergence of biases from multiple cues. These and other studies began to swing the balance of evidence in favor of a PDP type of account, and there is by now a very large body of work demonstrating continuous bidirectional interaction between subsystems of the language system (for review, see Spevack et al., Reference Spevack, Falandays, Batzloff and Spivey2018).

PDP models are able to capture the general pattern of behavioral data regarding lexical ambiguity resolution. To understand how, let us consider Kawamoto’s (Reference Kawamoto1993) influential PDP model of lexical ambiguity resolution, shown in Figure 2.4(b). In contrast to the IAM and TRACE models discussed in the previous section, Kawamoto’s model makes use of distributed rather than localist representations. Localist models have a one-to-one mapping between nodes and represented entities as well as hard-coded connections between entities. For example, the IAM and TRACE have a single node for each feature, letter, and word, and the connections between them are specified by the modeler in advance. In those models, access of a lexical entry corresponds to activation of the corresponding lexical node, and hence these models make it simple to compare the activity of multiple word nodes over time.

Distributed models, on the other hand, encode representations as a pattern of activity across many neurons that represent various features or microfeatures. In Kawamoto’s (Reference Kawamoto1993) model, each lexical entry corresponds to a vector of activation values for 216 nodes, which are meant to capture all features of a word: The first 48 nodes encode visual features that define the orthography of the word; the next 48 nodes encode phonetic features in specified positions, corresponding to pronunciation; the next 24 nodes encode part-of-speech; and the last 96 nodes encode meaning. While the total pattern across all nodes is unique with respect to each lexical entry, each individual feature (meaning each possible value for any of the 216 nodes) is consistent with several lexical entries. As a result, the representation of each lexical entry is partially overlapping with several other entries.

Another important difference between distributed and localist models is that, in the former, the strength of connections between nodes must be learned by the network, rather than coded by the researcher. Kawamoto’s model is fully connected, meaning there are bidirectional links between each of the 216 nodes. While it would, of course, be infeasible to manually code all connections in a network of this size, this property of distributed networks is actually a feature and not a bug: These models are intended to capture developmental phenomena by teaching a lexicon to the network. The network begins with connection strengths of 0 between all nodes. During training, lexical entries (vectors of 216 activation values) are presented to the network, which spreads activation according to its connection strengths, eventually settling into a stable activity pattern. Initially, this stable pattern will not match the target pattern corresponding to the lexical entry, so an error correction algorithm is used to modify the connection strengths after each training trial, bringing the output closer to the target pattern. After training, features that co-occur in a word develop stronger connections, such that when some subset of a word’s features are presented to the network (e.g., only orthography or pronunciation), the full pattern of activity for that lexical entry may emerge in the network. A lexical entry has been accessed by the network when the full pattern of activity settles into a stable state that matches some lexical entry.

The behavior of the network can be best understood as operating in a high-dimensional state space, where each node serves as a dimension, and the activation of all nodes is a set of coordinates that describes the location of the system in the state space at that point in time. When the network is presented with an ambiguous word, its location in the state space (i.e., its activation pattern) moves in a direction that is somewhat toward both regions that belong to the two meanings of that word. Gradually, as context and other factors bias the system’s interpretation of this ambiguous word, the trajectory will curve toward the region in state space that corresponds to the contextually appropriate meaning. This nonlinear trajectory of the system, as it moves through state space, can be mathematically described as following along the contours of an energy landscape that is imposed on the volume of the state space by external inputs, context, and its neural connectivity pattern of excitatory and inhibitory synapses. This energy landscape describes how certain regions of state space have a strong attracting force and other regions may have a weak attracting force. Interspersed among these basins of attraction in the state space are other regions that repel the system away from them (peaks in the energy landscape). The simplified sketches of basins of attraction in Figures 2.12.3 have associated with them energy landscapes that make some portions of them more strongly attracting and other portions less so. For instance, Figure 2.4(c) shows an example of an energy landscape where the state space of the system would correspond to the two-dimensional floor of that three-dimensional space, and the height dimension corresponds to the potential energy of the system. Much like a marble would roll with gravity and momentum, the state of the system (indicated as a black circle on the manifold surface of Figure 2.4(c) rolls down the energy landscape’s nonlinear slopes and settles into an attractor basin (which corresponds to a location in space that belongs to a word’s meaning).

In Kawamoto’s (Reference Kawamoto1993) simulations, unambiguous words were recognized (and settled in their energy landscapes) more quickly than biased ambiguous words (words having one sense that is more common or dominant than the other). This makes sense because an unambiguous word will have only one attractor basin, and a biased ambiguous word (Figure 2.4c) will have two attractor basins, resulting in some competition or vacillation between those two regions in state space. Equi-biased ambiguous words, however, were recognized even more slowly than the biased ambiguous words, because, while those biased ambiguous words have two attractor basins, one of them is much steeper/stronger than the other. By contrast, the equi-biased ambiguous words have two attractor basins that are nearly equal in strength, so the system takes longer to finally settle into one of them. Importantly, Kawamoto found that sentence context has a differential effect on biased and equi-biased ambiguous words. With equi-biased ambiguous words, context was highly effective at tipping the balance and causing the system to settle into the contextually appropriate attractor basin. However, with biased ambiguous words, only a very strongly biasing context would be capable of pushing the system toward the less common (or subordinate) meaning of that word.

The work reviewed here illustrates the power of PDP models for explaining lexical ambiguity resolution. Through the imagery of a high-dimensional state space, with an energy landscape determining its dynamics, it becomes clear how these models can capture both delayed effects of context (Swinney, Reference Swinney1979; Tanenhaus et al., Reference Tanenhaus, Leiman and Seidenberg1979) and early effects of context (Tabossi, Reference Tabossi1988; Vu et al., Reference Vu, Kellas and Paul1998).

Bilingual Interactive Activation

Although early theories of bilingual language processing proposed that bilinguals could selectively activate one of their languages and deactivate the other (Macnamara & Kushnir, Reference Macnamara and Kushnir1971), the behavioral data now overwhelmingly support a parallel interactive account, with both orthographic or phonological input simultaneously activating representations in both languages. For example, eye-tracking studies have shown that hearing spoken words in one language can lead to eye fixations of a distractor object whose name is phonologically similar in the task-irrelevant language (Marian & Spivey, Reference Marian and Spivey2003a; Spivey & Marian, Reference Spivey and Marian1999). When instructed to pick up the marker, Russian-English bilinguals frequently look first at a stamp (called marka in Russian) before finally fixating the marker (Spivey & Marian, Reference Spivey and Marian1999). Importantly, the magnitude of this interlingual cohort effect is dependent on several factors, including language experience (Weber & Cutler, Reference Weber and Cutler2004), phonetic featural similarity (Ju & Luce, Reference Ju and Luce2004), and recent use (Marian & Spivey, Reference Marian and Spivey2003b). Similar results have been obtained for written input (De Groot, Delmaar, & Lupker, Reference De Groot, Delmaar and Lupker2000; Dijkstra, Grainger, and van Heuven, Reference Dijkstra, Grainger and van Heuven1999), with activation of words in the irrelevant language being possible even when there is orthographic but no phonological overlap (Marian & Kaushanskaya, Reference Marian and Kaushanskaya2004) or vice versa (Kaushanskaya & Marian, Reference Kaushanskaya, Marian, Forbus, Gentner and Regier2004). Furthermore, cross-linguistic interference has been found to be dependent on the number of orthographic neighbors of the target word in the nontarget language (van Heuven, Dijkstra, & Grainger, Reference van Heuven, Dijkstra and Grainger1998). Taken together, the experimental evidence indicates that, for bilingual speakers, both orthography and phonology can activate consistent words in both languages, orthography activates phonology and vice versa, and there are important roles for language history and stimulus characteristics (van Hell & Tanner, Reference van Hell and Tanner2012). As such, these results are consistent with a PDP account of bilingual lexical processing, where multiple parameters are brought together as dimensions in a high-dimensional state space (Onnis & Spivey, Reference Onnis and Spivey2012).

The BIA (van Heuven et al., Reference van Heuven, Dijkstra and Grainger1998) and BIA+ (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002) were built on top of the original IAM (McClelland & Rumelhart, Reference McClelland and Rumelhart1981) to deal with the case of bilingual language processing, in which there are words from two or more languages that may overlap in features. The BIA model (Figure 2.4d), like the IAM, includes layers with localist nodes encoding features, letters, and words, respectively (although a distributed-coding version of this model has been proposed; French, Reference French, Gernsbacher and Derry1998; Jacquet & French, Reference Jacquet and French2002). These layers work identically to that of the IAM: feature nodes activate letters (in a specified position within the word), nodes for letters in each position activate words with which they are consistent, and word nodes have feedback connections with letter nodes and inhibitory connection with all other word nodes. The BIA+ added additional layers for phonology and semantics (for simplicity, hereafter our discussion will focus on this version of the model). The lexicon, in the case of the BIA+, now includes words from two languages instead of one. Importantly, this architecture models bilinguals as having a unified lexicon: Letters activate words in both languages indiscriminately and words across languages retain inhibitory connections.

The most important difference between the IAM and BIA+ lies in the addition of a top-most language layer. This layer includes two nodes – one for each language – that have bidirectional excitatory connections with all words in that language and inhibitory connections with all words in the other language. This layer models the concept of a language mode, as suggested by Grosjean (Reference Grosjean and Nicol2001), whereby recent exposure to one language will prime that language, resulting in processing costs when switching languages (Altarriba et al., Reference Altarriba, Kroll, Sholl and Rayner1996; Meuter & Allport, Reference Meuter and Allport1999).

While lexical ambiguity in the monolingual case refers to intralingual homonyms, homographs, and homophones, bilingual models need to account for the addition of interlingual ambiguities as well. Cross-language lexical ambiguities are functionally represented in the BIA models by the inclusion of two separate word nodes, one in each language, which differ in some of their connections to the orthography, phonology, and semantic nodes, and exclusively activate their respective language nodes. For the present purposes, interlingual ambiguities can be divided into three classes. First, cognates are pairs of words that have the same spelling and meaning in two languages. For example, actor has the same meaning in English and Spanish but slightly different phonology. In the model, the two word nodes corresponding to a pair of cognates will have the same connections to the orthography and semantic nodes and some of the same connections to the phonology layer (depending on the degree of phonological similarity across the two languages). Orthographic input to the model will activate both words equally, which will then mutually inhibit each other via the inhibitory connections between all words. Hence, this type of ambiguity cannot be resolved without help from the language nodes. Prior unambiguous input to the model in one of the two languages will selectively activate the corresponding language node, which then acts to inhibit all word nodes in the opposing language. This alters the starting activation levels of the word nodes, allowing the node in the relevant language to more strongly inhibit its counterpart and win the competition.

Next, false cognates, or interlingual homographs, are pairs of words with the same spelling but different meanings in each language. For example, main is a synonym for primary in English but in French means hand (with a fairly different pronunciation). These would be represented in the BIA models as word nodes in each language that are identical in their connections to the orthography layer, partly different in their connections to the phonology later, and completely different in their connections to the semantic layer. Ambiguity resolution in this case could occur again by priming of the language nodes or instead through contextual bias. If, for example, sentential context activates semantic nodes corresponding to one of the two resolutions of the ambiguity, this alters the initial state of the system to be closer to one option. A sufficiently biasing sentential context, even in the nontarget language, could override the influence of the language nodes, leading the system to correctly recognize a code-switched word that does not match the language of the sentential context. This is consistent with experimental evidence showing that the processing cost of switching languages is dependent on contextual bias (Li, Reference Li1996; Moreno, Federmeier, & Kutas, Reference Moreno, Federmeier and Kutas2002). Furthermore, results have shown that code-switching is easier when the phonology of the code-switched word is different from that of the context language (Grosjean, Reference Grosjean, Milroy and Muysken1995; Li, Reference Li1996). In the BIA models, this would be accounted for by the fact that code-switched words with minimal phonological overlap with the context language will activate fewer competitors in the context language, leading to faster resolution.

Finally, partial cognates, or interlingual cohorts, are pairs of words across languages in which there is partial overlap in spelling or phonology. For example, the English word shark is an interlingual cohort of sharik (the Russian word for balloon). Because the bottom-up connections in the BIA models are not language-selective, any input will send activation to orthographic or phonological neighbors in both languages, and the degree of competition in the network will be dependent on the number of neighbors. Figure 2.5 uses the lexical-fields framework from Figures 2.12.3 to depict the regions of state space that can be visited while the word shark is being presented to a monolingual English speaker (Figure 2.5a) or to a Russian-English bilingual (Figure 2.5b). For a monolingual, the lexical field (or energy landscape, for that word stretches into a few different regions of semantic state space) and the trajectory (or activation pattern over time) will be somewhat nonlinear as it curves slightly toward the wrong word. By contrast, a bilingual’s lexical field stretches out into several more regions of state space, resulting in an exceptionally curved trajectory. While being presented with shark, the patterns of activation in BIA+ would mimic this kind of state-space trajectory as it moves somewhat close to an interlingual cohort competitor before finally settling into the correct pattern of activation.

Figure 2.5 (a) For a monolingual, the linguistic input shark has orthographic and phonological similarity to both shark or sharp, and a few other words; (b) For a bilingual, that same input has similarity with even more lexical representations, thus producing an extremely nonconvex lexical field, and an even more nonlinear trajectory

However, feedback from the language nodes in the BIA+ model can lead to asymmetric competition, whereby intralingual competitors in the primed language will exert more influence than the interlingual competitors from the other language. As an example from human data, when Marian and Spivey (Reference Marian and Spivey2003a) placed Russian-English bilinguals into a relatively monolingual Russian language mode (with a consent form in Russian, the experimenter speaking only native Russian, and Russian music in the background), those participants exhibited substantial lexical competition from intralingual competitors in Russian but not as much from interlingual competitors in English. Ultimately, however, resolution in the case of partial cognates will reliably be accomplished in the BIA+ model purely through bottom-up information (without the need for context), since words in either language with partially inconsistent orthography or phonology will receive less activation than the target word.

As revealed by Kawamoto’s (Reference Kawamoto1993) model, contextual priming in the BIA+ model can also influence the initial state of the system and thus bias it toward one meaning of an ambiguous word. For example, Schwartz and Kroll (Reference Schwartz and Kroll2006) found that, when sentence context was weak, cognates were processed faster by bilingual participants than words in only one language, indicating that lexical representations from both languages affected processing. However, when sentence context was strong, this effect disappeared, suggesting that comprehension was guided selectively to the meaning in only one of the languages (see also Libben & Titone, Reference Libben and Titone2009).

Because work using the BIA models has not specifically focused on simulating lexical ambiguity resolution tasks, it is important to note that the account we have given here is somewhat speculative in nature. However, with a general understanding of PDP principles and the structure of the BIA models, we expect that this account will by now be intuitively clear. Since the BIA models allow parallel, bottom-up activation of words in both languages, interlingual ambiguities are really not that different from intralingual ambiguities with monolinguals. How quickly the system can resolve these ambiguities, and which resolution ultimately wins, is dependent on the starting state of the system – via priming of language or semantic nodes – and the overall energy landscape, which determines the degree of attraction toward various outcomes.


Obviously, bilingualism does not involve having a new and separate lexicon module inserted into a person’s cortex. Learning an L2, early or late in life, involves rewiring the existing connectivity of multiple language areas of the brain. This network of networks has some portions of it that are mostly specialized for one or the other language (Kim et al., Reference Kim, Relkin, Lee and Hirsch1997), but it also has many portions that are used by both languages (Marian, Spivey, & Hirsch, Reference Marian, Spivey and Hirsch2003). The BIA and BIA+ models of bilingual language processing have pursued that general kind of architecture and produced results that correspond well with human data (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002; van Heuven et al., Reference van Heuven, Dijkstra and Grainger1998). As a result of that type of cortical connectivity in a bilingual, reading or hearing a word from one language can inadvertently partially activate a lexical representation in the other language. It turns out that this process in bilinguals is not that different from related processes in monolinguals. When monolinguals read or hear a word in their language, they also exhibit inadvertent partial activation of other related lexical representations.

Rather than thinking of these lexical representations as line entries in a mental dictionary, some of which get partially activated, we have chosen a different framework here for understanding how ambiguity (temporary or otherwise) causes the language system to vacillate between multiple possible interpretations. We have chosen a state-space framework, wherein lexical representations exist as attractor basins, some with a strong or weak pull, some with partial overlap with one another, and some with tendrils that stretch out to semantically disparate regions of state space. Those tentacular lexical attractor basins, whose tendrils reach out in many directions in state space, may be unusually prevalent in bilinguals, compared to monolinguals.

While the dictionary framework is clearly a metaphor, intended to help one imagine how words might be organized in the language system, the state-space framework need not be conceived as a metaphor (Onnis & Spivey, Reference Onnis and Spivey2012). When one takes a neural network, such as a brain or connectionist model, and treats each node’s activation as a coordinate in a state space, this serves as a mathematical description of the state of the actual system (Elman, Reference Elman2004, Spivey, Reference Spivey2008) – not a metaphor. Scientific metaphors always break down at some point and can provide misleading insights (Hoffman, Reference Hoffman, Honeck and Hoffman1980). In the case of a simulated neural network processing two languages, as its state changes from timestep to timestep, one can access all the data necessary to provide an accurate state-space description of this system – perhaps performing a dimensionality reduction down to two or three dimensions for purposes of data visualization (Elman, Reference Elman1991). In the case of an actual brain-and-body processing two languages, however, it is of course not possible to measure every node in the network. Nonetheless, we can measure quite a bit; and, when the right behavioral measures are chosen carefully and sampled as continuously as possible (e.g., Louwerse et al., Reference Louwerse, Dale, Bard and Jeuniaux2012), those behaviors can be seen as performing something similar to the dimensionality reduction performed on the simulated neural network, thus allowing us to witness a low-dimensional record of the high-dimensional mental trajectory (Spivey & Dale, Reference Spivey and Dale2006, p. 209). Importantly, even with quantitatively abstracted data, from recorded behaviors that result from hidden neural processes, we are still not using a metaphor when we plot those data into a state space for data visualization. The neural dimensions have been reduced by the motor system in poorly understood ways, but there is no figurative analogy being used to liken linguistic processes to something else, such as a book with lexical entries listed in alphabetical order.

In this chapter, we have provided a series of theory visualizations as proxies for those data visualizations. Armed with state-space trajectories of connectionist networks addressing lexical ambiguity resolution in monolingual conditions and in bilingual conditions, one can see that the attractor basins corresponding to word representations come in a wide variety of shapes and sizes. Bilingualism may not instigate a qualitatively different format of processing but instead may just introduce a quantitative change in the distribution of those different shapes and sizes. Compared to monolinguals, bilinguals may experience a little more phonological (and in some cases orthographic) overlap in their lexical fields, which may result in a little more lexical competition on a regular basis. Perhaps it is this incessant practice with increased lexical competition that trains a bilingual’s brain to have greater cognitive control (e.g., Kroll & Bialystok, Reference Kroll and Bialystok2013; Spivey & Cardon, Reference Spivey, Cardon and Schwieter2015). If one must have a metaphor, then – far from being a dictionary – the mental lexicon is perhaps more like a high-dimensional golf course with sandpits, greens, and fairways all interlacing among one another; and a bilingual’s golf course is especially tangled.


Ambiguous words, Bilingual interactive activation (BIA) model, Bilingual interactive activation Plus (BIA+) model, Bilingual lexical processing, Code-switching, Cognates, Cohort model, Connectionist models, Contextual diversity, Continuous mapping model, Cross-linguistic interference, Distributed networks, Distributed representation, Dynamical-systems state space, False cognates, Feature layer, Homographs, Homonyms, Homophone, Individual differences, Inhibitory connections, Interlingual ambiguities, Interlingual cohort effect, Interactive activations model (IAM), jTRACE, Language experience, Language mode, Language module, Letter layer, Lexical access, Lexical ambiguity, Lexical entries, Lexical-fields framework, Localist models, Microfeatures, Parallel distributed processing (PDP), Partial cognates, Phoneme layer, Phonetic featural similarity, Phono-semantic state space, Phonologically similar, Polysemous, Rhyme-cohorts, Semantic field, Semantics, Semantics, Sentence context, Sequential search, Situation context, Theory visualizations of bilingual lexical ambiguity, Theory Visualizations of Lexical Fields, TRACE, Word superiority effect

Thought Questions

1. What are the pros and cons of localist versus distributed connectionist models of bilingualism? Is one more appropriate than the other?

2. Age of acquisition is not modeled in the BIA or BIA+ but is known to have important effects. How might age of acquisition be integrated with these models?

3. Views of embodied cognition (e.g., Barsalou, Reference Barsalou2008) suggest that action, planning, and sensorimotor representations may also play roles in language processing. How might these or other cues influence bilingual processing?


Dale, R., Fusaroli, R., Duran, N. D., & Richardson, D. C. (2013). The self-organization of human interaction. In Psychology of Learning and Motivation, 59, 4395.CrossRefGoogle Scholar
Adelman, J. S., Brown, G. D., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17(9), 814823.CrossRefGoogle Scholar
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of memory and language, 38(4), 419439.CrossRefGoogle Scholar
Altarriba, J., & Gianico, J. L. (2003). Lexical ambiguity resolution across languages: A theorical and empirical review. Experimental Psychology, 50(3), 159170.Google Scholar
Altarriba, J., Kroll, J. F., Sholl, A., & Rayner, K. (1996). The influence of lexical and conceptual constraints on reading mixed-language sentences: Evidence from eye fixations and naming times. Memory and Cognition, 24(4), 477492.CrossRefGoogle ScholarPubMed
Barsalou, L. W. (2008). Grounded Cognition. Annual Review of Psychology, 59, 617645.CrossRefGoogle ScholarPubMed
Chen, Q., Huang, X., Bai, L., Xu, X., Yang, Y., & Tanenhaus, M. K. (2017). The effect of contextual diversity on eye movements in Chinese sentence reading. Psychonomic Bulletin and Review, 24(2), 510518.CrossRefGoogle ScholarPubMed
De Groot, A. M., Delmaar, P., & Lupker, S. J. (2000). The processing of interlexical homographs in translation recognition and lexical decision: Support for non-selective access to bilingual memory. The Quarterly Journal of Experimental Psychology, 53A(2), 397428.Google Scholar
Dijkstra, T., Grainger, J., & van Heuven, W. J. (1999). Recognition of cognates and interlingual homographs: The neglected role of phonology. Journal of Memory and language, 41(4), 496518.CrossRefGoogle Scholar
Dijkstra, T., & van Heuven, W. J. (2002). The architecture of the bilingual word recognition system: From identification to decision. Bilingualism: Language and Cognition, 5(3), 175197.CrossRefGoogle Scholar
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second language acquisition. Routledge.Google Scholar
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(23), 195225.CrossRefGoogle Scholar
Elman, J. L. (2004). An alternative view of the mental lexicon. Trends in Cognitive Sciences, 8(7), 301306.CrossRefGoogle ScholarPubMed
Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science, 33(4), 547582.CrossRefGoogle ScholarPubMed
Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time. Journal of Memory and Language, 12(6), 627635.Google Scholar
French, R. M. (1998). A simple recurrent network model of bilingual memory. In Gernsbacher, M. A. & Derry, S. J. (Eds.), Proceedings of the 20th Annual Cognitive Science Society Conference (pp. 368373). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Gibbs, R., & Matlock, T. (2001). Psycholinguistic perspectives on polysemy. In Cuyckens, H. & Zawada, B. (Eds.), Polysemy in cognitive linguistics. (pp. 213239). Amsterdam: John Benjamins.CrossRefGoogle Scholar
Grosjean, F. (1994). Individual bilingualism. In The encyclopedia of language and linguistics (pp. 16561660). Oxford: Pergamon Press.Google Scholar
Grosjean, F. (1995). A psycholinguistic approach to code-switching: The recognition of guest words by bilinguals. In Milroy, L. & Muysken, P. (Eds.), One speaker, two languages (pp. 259275). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Grosjean, F. (2001). The bilingual’s language modes. In Nicol, J. (Ed.), One mind, two languages: Bilingual language processing (pp. 122). Oxford: Blackwell.Google Scholar
Hills, T. T., Maouene, J., Riordan, B., & Smith, L. (2010). The associative structure of language: Contextual diversity in early word learning. Journal of Memory and Language, 63(3), 259273.CrossRefGoogle ScholarPubMed
Hoffman, R. R. (1980). Metaphor in science. In Honeck, R. P. & Hoffman, R. R. (Eds.), The psycholinguistics of figurative language. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Jacquet, M., & French, R. M. (2002). The BIA++: Extending the BIA+ to a dynamical distributed connectionist framework. Bilingualism: Language and Cognition, 5(3), 202205.CrossRefGoogle Scholar
Johns, B. T., Dye, M., & Jones, M. N. (2016). The influence of contextual diversity on word learning. Psychonomic Bulletin and Review, 23(4), 12141220.CrossRefGoogle ScholarPubMed
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears: Constraints on bilingual lexical activation. Psychological Science, 15(5), 314318.CrossRefGoogle ScholarPubMed
Kaushanskaya, M., & Marian, V. (2004). Activation of non-target language phonology during bilingual visual word recognition: Evidence from eye-tracking. In Forbus, K., Gentner, D., & Regier, T. (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 654659). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Kawamoto, A. H. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: a distributed processing account. Journal of Memory and Language, 32, 474516.CrossRefGoogle Scholar
Kim, K. H., Relkin, N. R., Lee, K. M., & Hirsch, J. (1997). Distinct cortical areas associated with native and second languages. Nature, 388(6638), 171174.CrossRefGoogle ScholarPubMed
Kroll, J. F., & Bialystok, E. (2013). Understanding the consequences of bilingualism for language processing and cognition. Journal of Cognitive Psychology, 25(5), 497514.CrossRefGoogle ScholarPubMed
Lehrer, A. (1974). Semantic fields and lexical structure, Amsterdam: John Benjamins.Google Scholar
Li, P. (1996). Spoken word recognition of code-switched words by Chinese–English bilinguals. Journal of Memory and Language, 35(6), 757774.CrossRefGoogle Scholar
Libben, M. R., & Titone, D. A. (2009). Bilingual lexical access in context: evidence from eye movements during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(2), 381390.Google ScholarPubMed
Louwerse, M. M., Dale, R., Bard, E. G., & Jeuniaux, P. (2012). Behavior matching in multimodal communication is synchronized. Cognitive Science, 36(8), 14041426.CrossRefGoogle ScholarPubMed
Lyons, J. (1963). Structural semantics. Oxford: Blackwell.Google Scholar
MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter(1992) and Waters and Caplan(1996). Psychological Review, 109(1), 3554.CrossRefGoogle Scholar
Macnamara, J., & Kushnir, S. L. (1971). Linguistic independence of bilinguals: The input switch. Journal of Memory and Language, 10(5), 480.Google Scholar
Marian, V., & Kaushanskaya, M. (2004). Self-construal and emotion in bicultural bilinguals. Journal of Memory and Language, 51(2), 190201.CrossRefGoogle Scholar
Marian, V., & Spivey, M. (2003a). Bilingual and monolingual processing of competing lexical items. Applied Psycholinguistics, 24(2), 173193.CrossRefGoogle Scholar
Marian, V., & Spivey, M. (2003b). Competing activation in bilingual language processing: Within-and between-language competition. Bilingualism: Language and Cognition, 6(2), 97115.CrossRefGoogle Scholar
Marian, V., Spivey, M., & Hirsch, J. (2003). Shared and separate systems in bilingual language processing: Converging evidence from eyetracking and brain imaging. Brain and Language, 86(1), 7082.CrossRefGoogle ScholarPubMed
Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word-recognition. Cognition, 25(1–2), 71102.CrossRefGoogle ScholarPubMed
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 186.CrossRefGoogle ScholarPubMed
McClelland, J. L., & Johnston, J. C. (1977). The role of familiar units in perception of words and nonwords. Perception and Psychophysics, 22(3), 249261.Google Scholar
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375407.CrossRefGoogle Scholar
Meuter, R. F., & Allport, A. (1999). Bilingual language switching in naming: Asymmetrical costs of language selection. Journal of memory and language, 40(1), 2540.CrossRefGoogle Scholar
Miyake, A., Just, M. A., & Carpenter, P. A. (1994). Working memory constraints on the resolution of lexical ambiguity: Maintaining multiple interpretations in neutral contexts. Journal of Memory and Language, 33(2), 175202.CrossRefGoogle Scholar
Moreno, E. M., Federmeier, K. D., & Kutas, M. (2002). Switching languages, switching palabras (words): An electrophysiological study of code switching. Brain and Language, 80(2), 188207.CrossRefGoogle ScholarPubMed
Onnis, L., Spivey, M. J. (2012). Toward a new scientific visualization for the language sciences. Information, 3, 124150.CrossRefGoogle Scholar
Plummer, P., Perea, M., & Rayner, K. (2014). The influence of contextual diversity on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(1), 275283.Google ScholarPubMed
Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: II. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89(1), 6094.CrossRefGoogle Scholar
Schwartz, A. I., & Kroll, J. F. (2006). Bilingual lexical activation in sentence context. Journal of Memory and Language, 55(2), 197212.CrossRefGoogle Scholar
Spevack, S. C., Falandays, J. B., Batzloff, B., & Spivey, M. J. (2018). Interactivity of language. Language and Linguistics Compass, 12(7), e12282.CrossRefGoogle Scholar
Spivey, M. J. (2008). The continuity of mind. New York: Oxford University Press.Google Scholar
Spivey, M. J., & Cardon, C. D. (2015). Methods for studying adult bilingualism. In Schwieter, J. (Ed.), The Cambridge handbook of bilingual language processing. (pp. 108132). New York: Cambridge University Press.CrossRefGoogle Scholar
Spivey, M. J., & Dale, R. (2006). Continuous dynamics in real-time cognition. Current Directions in Psychological Science, 15(5), 207211.CrossRefGoogle Scholar
Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages: Partial activation of an irrelevant lexicon. Psychological Science, 10(3), 281284.Google Scholar
Spivey-Knowlton, M. J. (1996). Integration of visual and linguistic information: Human data and model simulations. Unpublished doctoral dissertation, University of Rochester.
Strauss, J., Harris, H. D., & Magnuson, J. S. (2007). jTRACE: A reimplementation and extension of the TRACE model of speech perception and spoken word recognition. Behavior Research Methods, 39(1), 1930.CrossRefGoogle ScholarPubMed
Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior, 18(6), 645659.CrossRefGoogle Scholar
Tabossi, P. (1988). Accessing lexical ambiguity in different types of sentential contexts. Journal of Memory and Language, 27, 324340.Google Scholar
Tanenhaus, M. K., Leiman, J. M., & Seidenberg, M. S. (1979). Evidence for multiple stages in the processing of ambiguous words in syntactic contexts. Journal of Verbal Learning and Verbal Behavior, 18(4), 427440.CrossRefGoogle Scholar
Toscano, J. C., Anderson, N. D., & McMurray, B. (2013). Reconsidering the role of temporal order in spoken word recognition. Psychonomic Bulletin and Review, 20(5), 981987.CrossRefGoogle ScholarPubMed
van Hell, J. G., & Tanner, D. (2012). Second language proficiency and cross‐language lexical activation. Language Learning, 62, 148171.CrossRefGoogle Scholar
van Heuven, W. J., Dijkstra, T., & Grainger, J. (1998). Orthographic neighborhood effects in bilingual word recognition. Journal of Memory and Language, 39(3), 458483.CrossRefGoogle Scholar
Vu, H., Kellas, G., & Paul, S. T. (1998). Sources of sentence constraint on lexical ambiguity resolution. Memory and Cognition, 26(5), 9791001.CrossRefGoogle ScholarPubMed
Weber, A., & Cutler, A. (2004). Lexical competition in non-native spoken-word recognition. Journal of Memory and Language, 50(1), 125.CrossRefGoogle Scholar
Figure 0

Figure 2.1 Theory visualizations of lexical fields in linguistic state space:lexical ambiguity involves a highly nonconvex shape that covers unrelated regions of semantic space;polysemy involves a relatively more convex shape that includes interstitial regions of semantic space; andtemporary phonological ambiguity, as with cohorts, often involves a highly nonconvex shape again, one that heavily depends on temporal dynamics

Figure 1

Figure 2.2 Individual differences in lexical fields:a person with low memory span or limited English experience would have a functionally narrow lexical field for the word boxer, whereasa person with high memory span or extensive English experience would have a more tentacular lexical field for boxer, with tendrils that stretch into a variety of semantic spaces

Figure 2

Figure 2.3 Contextual diversity of lexical fields:one region of linguistic-genre space in which the lexical field for posterior shows itself to be nondiverse and rather convex;another region of the same space in which the lexical field for piglet stretches itself nonconvexly into diverse contexts

Figure 3

Figure 2.4 (a) McClelland and Rumelhart’s (1981) interactive activation model processing the letter R; (b) Kawamoto’s (1993) PDP model of lexical ambiguity resolution with a sample of all connections shown; (c) an example energy landscape that determines the trajectory of a system as it traverses its state space; and (d) Dijkstra and van Heuven’s (2002) BIA+ model

Figure 4

Figure 2.5 (a) For a monolingual, the linguistic input shark has orthographic and phonological similarity to both shark or sharp, and a few other words; (b) For a bilingual, that same input has similarity with even more lexical representations, thus producing an extremely nonconvex lexical field, and an even more nonlinear trajectory

You have Access

Send book to Kindle

To send this book to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats

Send book to Dropbox

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Dropbox.

Available formats

Send book to Google Drive

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Google Drive.

Available formats