The processing advantage of multiword sequences: A meta-analysis

Wei Yi; Yanlu Zhong

doi:10.1017/S0272263123000542

The processing advantage of multiword sequences: A meta-analysis

Published online by Cambridge University Press: 06 November 2023

Wei Yi

and

Yanlu Zhong

Show author details

Wei Yi*: Affiliation:
School of Chinese as a Second Language, Peking University, Beijing, China
Yanlu Zhong: Affiliation:
School of Chinese as a Second Language, Peking University, Beijing, China
*: Corresponding author: Wei Yi; Email: weiyisla@pku.edu.cn

Article contents

Abstract
Introduction
Literature review
The current study
Method
Statistical regularities
Results
Discussion
Limitations and future studies
Conclusion
Supplementary material
Data availability statement
Competing interest
Footnotes
References

Rights & Permissions

Abstract

This meta-analysis synthesized 35 English studies (130 effect sizes, N = 1,981) that employed online tasks to investigate the processing of multiword sequences (MWSs). We examined (a) to what extent MWSs enjoy a processing advantage over novel word combinations; (b) how such a processing advantage is moderated by statistical regularities (i.e., phrasal frequency, association strength), MWS type, and explicitness of experimental tasks; and (c) whether such moderating patterns differ between L1 speakers and L2 speakers. The results confirmed the processing advantage for most subtypes of MWSs, with effect sizes ranging from small to medium. For L1 speakers and L2 speakers, the processing advantage of MWSs was found across the continuum of phrasal frequency and association strength and varied. Interestingly, task explicitness moderated the processing advantage of MWSs but only for L2 speakers. Taken together, our results shed light on the understanding of MWSs as well as directions for future research.

Type: Research Article
Information: Studies in Second Language Acquisition , Volume 46 , Issue 2 , May 2024 , pp. 427 - 452

DOI: https://doi.org/10.1017/S0272263123000542 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Introduction

Like single words, multiword sequences (MWSs) are said to be building blocks for language acquisition and processing (Christiansen & Arnon, Reference Christiansen and Arnon2017). Over the past decades, a growing number of studies have reported the processing advantage of MWSs over novel word combinations (i.e., word combinations generated creatively) for L1 speakers and L2 speakers (Conklin & Carrol, Reference Conklin and Carrol2021; Jiang & Nekrasova, Reference Jiang and Nekrasova2007; Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011). However, individual studies vary substantially in terms of experimental design, which leads to inconsistencies in research findings and leaves unclear (a) to what extent MWSs enjoy a processing advantage over novel word combinations, (b) which factors moderate the processing advantage of MWSs, and (c) whether L1 speakers and L2 speakers differ when processing MWSs. To address these issues, we systematically synthesized studies that employed online tasks to explore the processing advantage of MWSs. We incorporated statistical regularities (i.e., phrasal frequency and association strength), MWS type, and task explicitness as the moderators and compared their effects on the processing advantage of MWSs for L1 speakers and L2 speakers.

Literature review

MWSs and their processing advantage

Language can be represented at multiple levels. From the perspective of usage-based approaches (Christiansen & Arnon, Reference Christiansen and Arnon2017; Goldberg, Reference Goldberg1995), MWSs—which are word strings that co-occur more often than by chance—are integral building blocks of language. MWSs cover a wide variety of linguistic phenomena, including but not limited to idioms (e.g., kick the bucket), speech formulae (e.g., what’s up), phrasal verbs (e.g., take off), binomials (e.g., knife and fork), collocations (e.g., make progress), and lexical bundles (e.g., is one of the). Corpus studies have found that MWSs are highly frequent and widely used in language (Biber, Reference Biber2009). Moreover, they facilitate the development of fluency and nativelikeness for language speakers (Wray & Perkins, Reference Wray and Perkins2000). Many researchers (Wray, Reference Wray2002) believe that MWSs—especially formulaic (i.e., highly conventionalized) ones—are prefabricated chunks stored in the memory. Following this, they may enjoy a processing advantage over novel word combinations and free up cognitive resources by reducing the time pressure during language processing (Christiansen & Chater, Reference Christiansen and Chater2016).

Over the past decades, the processing advantage of MWSs has attracted an increasing amount of attention. A plethora of studies have found that various types of MWSs are processed significantly faster than are novel word strings, with similar results being reported in both children (Bannard & Matthews, Reference Bannard and Matthews2008) and adults (Arnon & Snider, Reference Arnon and Snider2010), for both L1 speakers (Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011) and L2 speakers (Wolter & Yamashita, Reference Wolter and Yamashita2018), using both online (Sonbul, Reference Sonbul2015; Tremblay & Baayen, Reference Tremblay, Baayen and Wood2010) and offline tasks (Sonbul, Reference Sonbul2015). However, given that the experimental design varies drastically across studies, it remains difficult to estimate to what extent MWSs are processed significantly faster than are novel word combinations.

Variables that may moderate the processing advantage of MWSs

Considerable evidence supports the processing advantage of MWSs over novel word combinations. However, a comprehensive understanding of the variables that could potentially moderate this advantage and the mechanisms through which they operate is still lacking. In this section, we will review three variables that seem most worth considering.

Statistical regularities

Language input is not random. Instead, it is characterized by underlying statistical regularities. Language users can capture distributional information and make use of it for phonological learning, word segmentation, and syntactic learning (for reviews, see Ellis, Reference Ellis2002; Saffran, Reference Saffran2003). Usage-based approaches (Christiansen & Arnon, Reference Christiansen and Arnon2017; Goldberg, Reference Goldberg1995) hold that language acquisition is exemplar based and shaped by repeated language use. By tracking statistical regularities, language users can establish and modify mental representations of linguistic units at various levels.

Recent research suggests that two types of statistical regularities play crucial roles in the processing of MWSs—namely, phrasal frequency and association strength (Gries & Ellis, Reference Gries and Ellis2015; Yi, Reference Yi2018; Yi et al., Reference Yi, Lu and Ma2017). Phrasal frequency indicates how likely a word string is to be experienced by language users (Gries & Ellis, Reference Gries and Ellis2015). In contrast, association strength evaluates the co-occurring probability of words that constitute MWSs and how likely language users can predict the words following or preceding another word in a sequence (Gablasova et al., Reference Gablasova, Brezina and McEnery2017). Phrasal frequency and association strength are usually computed based on corpora data, with the latter measured by metrics such as mutual information (MI), transitional probability, T-score, delta P, and log Dice (for reviews, see Gries, Reference Gries2022; Gries & Ellis, Reference Gries and Ellis2015; Yi, et al., Reference Yi, Man and Maie2023). Recent studies have demonstrated that L1 speakers and L2 speakers are sensitive to both phrasal frequency (Arnon & Snider, Reference Arnon and Snider2010; Tremblay & Baayen, Reference Tremblay, Baayen and Wood2010; Yi, Reference Yi2018; Yi et al., Reference Yi, Lu and Ma2017) and association strength (Gyllstad & Wolter, Reference Gyllstad and Wolter2016; Öksüz et al., Reference Öksüz, Brezina and Rebuschat2021; Yi, Reference Yi2018; Yi et al., Reference Yi, Lu and Ma2017) when processing MWSs. Specifically, higher frequency MWSs are processed significantly faster than are lower frequency ones (e.g., Durrant, Reference Durrant2008; Kim & Kim, Reference Kim and Kim2012; Sonbul, Reference Sonbul2015). Similarly, MWSs consisting of words that are more closely associated also tend to be processed more efficiently than those constituted by loosely associated words (Öksüz et al., Reference Öksüz, Brezina and Rebuschat2021; Yi, et al., Reference Yi, Lu and Ma2017).

The effects of statistical regularities on the processing of MWSs have been observed across the entire range (Arnon & Snider, Reference Arnon and Snider2010; Yi, Reference Yi2018; Yi et al., Reference Yi, Lu and Ma2017). Nevertheless, methodologically, researchers often define and select MWSs based on certain thresholds (e.g., occurring at least 10 times per million words; Biber et al., Reference Biber, Johansson, Leech, Conrad, Finegan and Quirk1999). Consequently, examining statistical regularities as moderators of the processing advantage of MWSs would enable us to determine whether this processing advantage is restricted to specific statistical profiles. Moreover, it would provide insight into whether variations in the processing advantage of MWSs can be attributed to variations in statistical regularities, thereby enhancing our understanding of whether language users use statistical information to process MWSs more efficiently.

MWS Type

MWS is an umbrella term that encompasses a wide range of larger-than-word units. MWSs are not as homogenous as one might think; instead, subtypes of MWSs vary drastically in terms of structural, semantic, and syntactic characteristics. For instance, whereas MWSs such as idioms, collocations, binomials, and phrasal verbs are structurally complete and self-contained, lexical bundles are structurally incomplete and often span syntactic boundaries (Jeong & Jiang, Reference Jeong and Jiang2019). Idioms are semantically figurative and noncompositional, with constituent words contributing little to the meaning of the whole word string (Carrol & Conklin, Reference Carrol and Conklin2020). In contrast, lexical bundles are semantically transparent and compositional, whereas collocations (e.g., build a career vs. build a house) and phrasal verbs (e.g., rise up vs. heat up) can be interpreted both literally and figuratively.

A wealth of studies has demonstrated the processing advantage of MWSs, including idioms (Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011), lexical bundles (Tremblay et al., Reference Tremblay, Derwing, Libben and Westbury2011), binomials (Carrol & Conklin, Reference Carrol and Conklin2020), collocations (Wolter & Yamashita, Reference Wolter and Yamashita2018), and phrasal verbs (Cappelle et al., Reference Cappelle, Shtyrov and Pulvermüller2010; Hanna et al., Reference Hanna, Cappelle and Pulvermüller2017). However, little research has examined whether such a processing advantage varied significantly across subcategories of MWSs. Columbus and Wood (Reference Columbus, Wood and Wood2010) compared the processing of idioms, lexical bundles, and collocations using an eye-tracking reading task. The results showed that L1 speakers of English read all three types of MWSs more quickly than control items, and idioms were processed faster than were lexical bundles and collocations. Jeong and Jiang (Reference Jeong and Jiang2019) examined the processing of structurally complete formulaic expressions (e.g., for example) and structurally incomplete lexical bundles (e.g., is one of the most) using a word-monitoring task. For both L1 speakers and L2 speakers of English, a processing advantage was found for formulaic sequences but not for lexical bundles. Gyllstad and Wolter (Reference Gyllstad and Wolter2016) compared the processing of novel word combinations (e.g., sing a song) and collocations (e.g., keep a secret) by L1 speakers and L2 speakers, using a semantic judgment task. The results showed a processing disadvantage for semitransparent collocations relative to semantically fully transparent word combinations, indicating that semantic transparency plays an important role in the processing of MWSs. By incorporating MWS type as a moderator, the extent of the processing advantage for subcategories of MWSs can be unveiled. This would enhance our comprehension of the multifaceted nature of multiword expressions, identify the subcategories of MWSs that pose greater challenges for L2 speakers to acquire, and contribute to the development and refinement of language processing models.

Explicitness of experimental tasks

Both online and offline experimental tasks—have been used to study the processing of MWSs. Different from offline tasks, online tasks are performed under significant time pressure and are more likely to tap into cognitive processes involved during the processing of MWSs than are offline tasks (Siyanova-Chanturia, Reference Siyanova-Chanturia2015). Following this, it is not uncommon to see that result patterns with respect to language processing revealed by offline tasks may not be consistent with those obtained from online tasks (Pellicer-Sánchez et al., Reference Pellicer-Sánchez, Siyanova-Chanturia and Parente2022; Sonbul, Reference Sonbul2015). Recent studies (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017) in the field of second language acquisition have shown that experimental tasks also vary in the degree of task explicitness. Task explicitness refers to the extent to which a task requires explicit or implicit knowledge. For instance, Suzuki and DeKeyser (Reference Suzuki and DeKeyser2015, Reference Suzuki and DeKeyser2017) concluded that eye tracking (i.e., visual-world paradigm), self-paced reading, and word-monitoring tasks measure implicit knowledge, whereas timed grammaticality judgment and elicited imitation tap into automatized explicit knowledge. Yi (Reference Yi2018) adopted a phrasal acceptability judgment task (PJT) to examine the processing of English collocations by L1 speakers and L2 speakers. Based on the correlational relationships between PJT performance and language aptitudes, he suggested that L1 speakers might process collocations implicitly, whereas L2 speakers might process collocations explicitly. To the best of our knowledge, little research has investigated whether task explicitness affects the processing advantage of MWSs. By incorporating task explicitness as a moderator in a meta-analysis, we could explore whether the processing advantage of MWSs is task dependent, thus shedding light on the cognitive mechanism underlying MWSs processing as well as the selection of experimental tasks for future studies.

Potential differences between L1 speakers and L2 speakers

Numerous studies have found that both L1 speakers (Carrol & Conklin, Reference Carrol and Conklin2020) and L2 speakers (Jeong & Jiang, Reference Jeong and Jiang2019; Wolter & Yamashita, Reference Wolter and Yamashita2018) process MWSs significantly faster than they do novel word combinations. However, research findings have not always been consistent and the magnitude of the processing advantage of MWSs varies among studies. For example, although the processing advantage of idioms is well established for L1 speakers (Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011), some studies (Carrol & Conklin, Reference Carrol and Conklin2014) did not replicate such patterns for L2 speakers. In addition, among those studies that compared L1 speakers and L2 speakers, some found the processing advantage of MWSs—such as idioms (Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011) and lexical bundles (Jeong & Jiang, Reference Jeong and Jiang2019)—for L1 speakers but not for L2 speakers. Such inconsistencies create difficulties in determining whether the processing advantage of MWS varies in degree between L1 speakers and L2 speakers and whether subtypes of MWSs may or may not enjoy a processing advantage for L1 speakers and L2 speakers. Regarding language users’ sensitivity to statistical regularities when processing MWSs, Ellis et al. (Reference Ellis, Simpson-Vlach and Maynard2008) suggest that L1 speakers’ processing of MWSs could be primarily influenced by association strength (measured by MI), whereas L2 speakers might exhibit greater sensitivity to phrasal frequency. Considering the significant disparities in learning context and language experience between L1 speakers and L2 speakers, it is crucial to explore whether the moderating effects of statistical regularities and task explicitness on the processing advantage of MWSs differ between these two groups.

The current study

As our review has shown, although many studies have reported the processing advantage of MWSs, due to methodological variations across studies, it remains unknown (a) to what extent MWSs are processed faster than novel word combinations; (b) how the processing advantage of MWSs is moderated by statistical regularities, type of MWSs, and task explicitness; and (c) whether such moderating patterns differ between L1 speakers and L2 speakers. Meta-analysis allows the examination of variables that were not the focus of individual studies. To address these research questions (RQs), we conducted a meta-analysis and synthesized studies that examined the processing advantage of MWSs using online experimental tasks.

Method

Literature search

We used the following databases to identify studies to include in this meta-analysis: Education Resources Information Center (ERIC), Google, Google Scholar, Linguistics and Language Behavior Abstracts (LLBA), Oxford Bibliographies, ProQuest Dissertations and Theses, PsycINFO, VARGA (Vocabulary Acquisition Research Group Archive), and Web of Science. We chose to start our meta-analysis from the year 2000 because there was a growing interest in the processing advantage of MWSs during that period and Wray’s work in 2002 (Wray, Reference Wray2002) made a significant contribution to the field and provided the basis for future studies on the topic. We searched these databases for abstracts published from January 2000 to January 2022 using the following keywords: binomials, collocations, formulaic language, formulaic sequences, formulaic speech, frozen phrases, idioms, lexical bundles, multiword expressions, multiword sequences, multiword units, prefabricated language, prefabricated patterns, phrasal verbs, word combinations, and word sequences. Additionally, we conducted a forward citation search on reports containing the above terms (see Appendix S1 for the PRISMA flow diagram for the inclusion of studies).

Inclusion and exclusion criteria

The following criteria were employed to determine which studies to include in this meta-analysis.

1) Significant discrepancies exist in the characteristics of MWS across languages. To ensure the comparability of research findings, we included only studies that were written in English and used English stimuli.
2) We excluded literature reviews and empirical studies that focused on the instruction and acquisition of novel multiword expressions.
3) We included studies that investigated the processing of MWSs using online tasks and excluded those that used only offline tasks.
4) We included studies that adopted reading time and reaction time (RT) as the outcome variable, following previous studies (Avery & Marsden, Reference Avery and Marsden2019).
5) We excluded those that only reported accuracies of experimental trials (accuracy data are less replicable and difficult to interpret, Jiang, Reference Jiang2012) or neural responses (e.g., EEG, ERP, fMRI).
6) We excluded studies that used production tasks (e.g., word naming), given that production tasks tap into cognitive processes that are different from those underlying perception tasks (e.g., lexical decision), Specifically, language users encode input into multiword units and pass them to a higher level of linguistic representation during perception tasks (Christiansen, Reference Christiansen and Chater2016; Jiang, Reference Jiang2012), whereas they retrieve ready-made units appropriate for conversation and piece them together in the opposite direction during production tasks (Arnon & Priva, Reference Arnon and Cohen Priva2013).
7) We excluded studies for which full texts were not available.
8) We excluded studies that did not provide enough information for calculating effect sizes (i.e., mean, SD, number of participants or items).
9) Thirty-five studies (130 effect sizes, 1,981 participants) met all the criteria. These studies comprised 29 journal articles, two doctoral dissertations, three book chapters, and one conference proceeding (see Appendix S2).

Coding

We coded the included studies for study identifiers, moderator variables, language background, and descriptive statistics for calculating effect sizes (see Appendix S3 for the coding sheet).

Statistical regularities

We coded phrasal frequency and association strength of MWSs separately. Specifically, we chose MI, forward delta P, and backward delta P as measures of association strength for the following reasons. First, many studies have documented that L1 speakers and L2 speakers are sensitive to the association strength of MWSs measured by MI and delta P (Gries, Reference Gries2013; Öksüz et al., Reference Öksüz, Brezina and Rebuschat2021; Yi, Reference Yi2018). Second, MI assumes the association between words as nondirectional, whereas delta P can evaluate the degree to which the first or the last word in a sequence is predicted by other words in a forward or backward direction (Gries, Reference Gries2013). Therefore, using both MI and delta Ps should allow us to test the directionality of association strength for MWSs (for the calculation of MI and delta Ps, see Appendix S4).

Studies included in this meta-analysis varied in their choice of corpora when retrieving phrasal frequency and association strength for MWSs. To ensure the validity of statistical regularities, we followed the practice of Lindstromberg and Eyckmans (Reference Lindstromberg and Eyckmans2020) and requeried the corpus statistics for MWSs in each study from the British National Corpus (BNC). The selection of BNC over other corpora, such as COCA (Contemporary Corpus of American English), was predicated on several factors. First, a greater number of studies (N = 18) have reported statistical regularities derived from BNC than reported from COCA (N = 9). Second, despite potential disparities between British and American English, prior research (Sonbul, Reference Sonbul2015; Yi et al., Reference Yi, Man and Maie2023) has shown that corpus data from BNC and COCA are similar and highly correlated. Following previous studies (Siyanova-Chanturia & Spina, Reference Siyanova-Chanturia and Spina2015; Yi et al., Reference Yi, Man and Maie2023), we first calculated the mean phrasal frequency, mean MI, and mean delta Ps based on statistics extracted from the BNC. We then ranked the means from lowest to highest and used the lower, median, and upper quartiles as cutoff points to split the statistical regularities into low, medium, high, and very high bins as shown in Table 1 (see Appendix S5 for more details).

Table 1. Average statistical regularities of MWSs across bins

Note. n represents the number of effect sizes in each bin. NA indicates that there was no data in the bin. Control refers to novel word combinations. Phrasal frequencies were transformed to number of occurrences per million words. Missing values were found for phrasal frequency (11), MI (11), forward delta P (13), and backward delta P (13).

MWS type

MWSs originally labeled in the included studies consisted of collocations (15 studies), idioms (11 studies), lexical bundles (4 studies), binominals (4 studies), formulaic sequences (2 studies), and multiword sequences/units (2 studies). After double-checking the word strings in the original research, we relabeled the stimuli in some studies and coded MWSs in all included studies into five categories: collocations (15 studies, 64 effect sizes), idioms (11 studies, 31 effect sizes), lexical bundles (6 studies, 17 effect sizes), binominals (4 studies, 12 effect sizes), and phrasal verbs (1 study, 6 effect sizes). Details about the grouping and definitions of each type of MWSs are provided in Appendix S6.

Task explicitness

The experimental tasks that were employed in the included studies comprised eye-tracking reading (ET; 13 studies, 43 effect sizes), grammaticality judgment (GJT; 1 study, 4 effect sizes), lexical decision (LDT; 8 studies, 32 effect sizes), phrasal acceptability judgment (PJT; 9 studies, 37 effect sizes), self-paced reading (SPR; 3 studies, 10 effect sizes), and word monitoring (WM; 1 study, 4 effect sizes; see Appendix S7 for the descriptions of tasks). Previous studies (Ellis, Reference Ellis2005; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2015, Reference Suzuki and DeKeyser2017) operationalized implicit and explicit knowledge tapped into by tasks based on the degree of awareness, time available, focus of attention, and metalinguistic knowledge, among which awareness is regarded as the core criterion. Given that awareness is subjective and extremely difficult to measure in real time, we dropped the awareness criterion and coded the explicitness of tasks based on time available, focus of attention, and metalinguistic knowledge.

Language users tend to be more aware of their knowledge employed in tasks carried out under less time pressure (Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017). Following this, tasks were coded as most explicit when participants completed them without time limits (i.e., at their own pace), whereas tasks were coded as less or least explicit when participants completed them as quickly as possible or with time limits.Footnote ¹ We operationalized the focus of attention for each task based on the availability of contextual information when processing MWSs given that language users are more likely to focus on meaning rather than on form when more contextual support is provided (Long, Reference Long, de Bot, Ginsberg and Kramsch1991; Suzuki & DeKeyser, Reference Suzuki and DeKeyser2017). Specifically, tasks (e.g., PJT, LDT) were coded as most explicit when MWSs were presented to participants without any contextual support, whereas tasks (e.g., SPR) were coded as less explicit when MWSs were embedded in individual sentences. Tasks (e.g., ET) were coded as least explicit when MWSs were presented to participants in paragraphs. Regarding metalinguistic knowledge, tasks (e.g., GJT) were coded as most explicit when they involved the use of analytic linguistic rules, whereas tasks were coded as less (e.g., LDT, PJT) or least explicit (e.g., SPR, ET) when they required minimum or no use of metalinguistic knowledge. Based on the scheme presented above, we assigned 1, 2, or 3 points to each category of the three dimensions to indicate increasing levels of task explicitness. Then, we rated each task on these dimensions and added up the points (range: 4–8; see Table 2;Footnote ² see Appendix S3 for the coding of task explicitness in each study). We separated the points into two groups, where tasks receiving 4 and 5 points were considered as low explicitness (45 effect sizes), and tasks receiving 6, 7 and 8 points were considered as high explicitness (85 effect sizes).

Table 2. The coding of task explicitness

Note. n = the number of studies.

Language background

To examine whether L2 speakers differ from L1 speakers during the online processing of MWSs, we coded participants’ language backgrounds as L1 (76 effect sizes) or L2 (54 effect sizes). We considered coding L2 proficiency into beginning, intermediate, and advanced levels, yet this idea was abandoned because the majority of included studies (97.1%) recruited advanced-level L2 participants and their measurement of L2 proficiency varied drastically across studies.

Coding procedure

The authors of this meta-analysis coded all the included studies separately using the developed coding scheme. We calculated the intercoder reliability using Cohen’s kappa test and found the agreement rate was κ = .997 (see Appendix S8 for Cohen’s kappa for each coded moderator).

Data analysis

Calculation of effect size

To calculate effect sizes, Cohen’s d, the standardized mean differences of RTs from a study, were estimated and then converted to Hedges’s g by multiplying the correction factor J (Borenstein et al., Reference Borenstein, Cooper, Hedges, Valentine, Copper, Hedges and Valentine2009) to address biases arising from small sample sizes. The calculation of all the measures is described in Appendix S9. For eye-tracking studies, we used total reading time to calculate effect sizes, given this measure sums all fixations made within a region of interest (Liversedge et al., Reference Liversedge, Paterson, Pickering and Underwood1998) and reflects the integration of information during language processing.

Analysis procedure

We conducted the analyses in the R statistical environment (version 4.1.2, R Core Team, 2021) using the metafor package (version 3.0.2, Viechtbauer, Reference Viechtbauer2010) and the clubSandwich package (version 0.5.5, Pustejovsky, Reference Pustejovsky2022). To deal with potential Type I errors due to dependencies among effect sizes, we built three-level meta-regression models (e.g., Yanagisawa & Webb, Reference Yanagisawa and Webb2021), which encompassed the sampling variance of each effect size (Level 1), the variance between effect sizes within the same study (Level 2), and the variance between studies (Level 3). Three-level meta-regression models can be viewed as an expansion of the conventional random-effects model (Yanagisawa & Webb, Reference Yanagisawa and Webb2021). In addition to three-level regressions, we applied cluster-robust variance estimation (Hedges et al., Reference Hedges, Tipton and Johnson2010) with small sample adjustments (Tipton & Pustejovsky, Reference Tipton and Pustejovsky2015) to control biases due to the dependency of effect sizes (Pustejovsky, Reference Pustejovsky2015). We set the significance level at α = .05. However, according to Greenland et al. (Reference Greenland, Senn, Rothman, Carlin, Poole, Goodman and Altman2016), it is advisable to perceive the p value as a continuous measure rather than a threshold. Therefore, following the practice of previous studies (e.g., Yanagisawa & Webb, Reference Yanagisawa and Webb2021), p values lower than .10 were interpreted as marginal significance, indicating a trend effect.

To estimate the overall processing advantage of MWSs over novel word combinations (RQ₁), we built a three-level model and calculated the weighted average effect size using the L1-L2 mixture data. Given the potential heterogeneity between L1 speakers and L2 speakers, we also split the data and calculated weighted average effect sizes for L1 speakers and L2 speakers, respectively. To evaluate the moderating effects of statistical regularities, MWS type, and task explicitness (RQ₂) on the processing advantage of MWSs and to examine potential differences between L1 speakers and L2 speakers in terms of such moderating effects (RQ₃), we followed the practice of previous studies (e.g., Yanagisawa & Webb, Reference Yanagisawa and Webb2021) and conducted separate meta-regressions with these moderators for L1 speakers and L2 speakers separately. Additionally, we tested the interaction between language background (L1 vs. L2) and the moderators using the L1-L2 mixture data to examine whether group comparisons for each moderator differ significantly between L1 speakers and L2 speakers. For RQ₂ and RQ₃, given control phrases in the included studies varied in phrasal frequency and association strength, we incorporated the statistical regularities of control phrases as covariates. In addition, we performed multiple comparisons by changing the reference levels (e.g., Yanagisawa et al., Reference Yanagisawa, Webb and Uchihara2020) to examine how the processing advantage of MWSs varied between levels of each moderator. Multiple comparisons between levels in each moderator were indicated by unstandardized coefficient estimates and p values. Main effects of the moderators in three-level models were examined by a conservative type of Wald test (i.e., HTZ test; Tipton & Pustejovsky, Reference Tipton and Pustejovsky2015) from which F statistics were obtained. Prior to the aggregation of effect sizes and moderator analyses, we conducted three analyses to assess the potential influence of publication bias on our data set: fail-safe N, Orwin’s fail-safe N, and the trim-and-fill method (Borenstein et al., Reference Borenstein, Cooper, Hedges, Valentine, Copper, Hedges and Valentine2009). All three measures indicated that no publication bias was present (see Appendix S10 for more details). Additionally, we conducted four sensitivity analyses to examine whether our meta-analysis results could be replicated under different scenarios (for more details, see Appendix S11).

Results

After aggregating 130 effect sizes from 35 studies, we found that the weighted average effect size was significant (Hedges’s g = 0.417; 95% CI = [0.276, 0.558], p < .001), suggesting that MWSs are processed significantly faster than novel word combinations. Regarding the heterogeneity of effect sizes, a significant Q statistic (Q = 451.74, p < .001) was found, indicating heterogeneity of effect sizes across the studies. The estimated variance components were τ² = 0.152 between studies and τ² < 0.001 within studies. The results of I² revealed that 70.91% of the total variance could be attributed to between-study heterogeneity (I² = 70.91%), whereas almost no variance could be attributed to within-study heterogeneity (I² < 0.01%). In addition, the prediction interval was [-0.77, 1.72] (see Appendix S12 for the forest plot).

By using language background as a moderator, the results showed that language users process MWSs significantly faster than novel word combinations (for L1 speakers: Hedges’s g = 0.475, 95% CI = [0.313, 0.636], p < .001; for L2 speakers: Hedges’s g = 0.348, 95% CI = [0.220, 0.475], p < .001). The Wald test showed that the processing advantage of MWSs for L1 speakers was significantly greater than that for L2 speakers (F (1, 16.3) = 11.90, p = .003).

The results of the separate moderator analyses after splitting the data for L1 speakers and L2 speakers are presented in Table 3 and Table 4, respectively (see Appendix S13 for the results of the moderator analyses based on the L1-L2 mixture data). For both L1 speakers and L2 speakers, the processing advantages of MWSs across all the bins of phrasal frequency and association strength (see Figure 1) were significantly or marginally significantly greater than 0.

Table 3. Results of moderator analyses based on L1 data

Note. k = the number of studies; n = the number of effect sizes; g = Hedges’s g; b = unstandardized coefficient; CI = confidence interval; ref = reference level. Missing values were present when coding the moderators. The percentages of studies were calculated after excluding missing values.

Table 4. Results of moderator analyses based on L2 data

Figure 1. . Processing advantages of MWSs broken down by statistical regularities and language background. *p < .10; ^**p < .05.

The processing advantages of MWSs across frequency bins ranged from 0.407 to 0.862 for L1 speakers and from 0.148 to 0.631 for L2 speakers. Wald tests revealed a marginally significant main effect of frequency for L1 speakers (F = 4.30, p = .079) and L2 speakers (F = 4.51, p = .062). Subsequent multiple comparisons revealed that the processing advantage of MWSs in the very-high- (L1 speakers: b = 0.455, 95% CI = [0.170, 0.740], p = .029; L2 speakers: b = 0.483, 95% CI = [0.239, 0.727], p = .006) and high-frequency bins (L1 speakers: b = 0.270, 95% CI = [0.086, 0.455], p = .040; L2 speakers: b = 0.351, 95% CI = [0.118, 0.584], p = .015) were significantly greater than that in the low-frequency bin (see Appendix S14 for multiple comparison results). We also found differences between L1 speakers and L2 speakers. Specifically, L1 speakers in the medium-frequency bin did not show a significant advantage in processing MWSs over those in the low-frequency bin, whereas L2 speakers did (b = 0.293, 95% CI = [0.097, 0.489], p = .029). In addition, L1 speakers showed significantly or marginally significantly greater advantages in processing MWSs in the very-high- (b = 0.407, 95% CI = [0.099, 0.714], p = .041) and high-frequency bins (b = 0.222, 95% CI = [0.027, 0.417], p = .080) relative to the medium-frequency bin, whereas L2 speakers did not show such significant differences. Taken together, such results suggest that the processing advantage of MWSs seems to follow an increasing pattern as frequency increases (see Figure 1).

Regarding association strength, L1 speakers’ processing advantage of MWSs across association strength bins ranged from 0.481 to 0.760 for MI, from 0.320 to 0.686 for forward delta P, and from 0.438 to 0.662 for backward delta P. For L2 speakers, the processing advantage of MWSs ranged from 0.160 to 0.648 for MI, from 0.268 to 0.669 for forward delta P, and from 0.253 to 0.618 for backward delta P. Subsequent multiple comparisons revealed that both L1 speakers and L2 speakers showed significantly or marginally significantly greater processing advantages for MWSs in the medium MI bin than in the low- (L1 speakers: b = 0.260, 95% CI = [0.126, 0.394], p = .025; L2 speakers: b = 0.335. 95% CI = [0.111, 0.559], p = .033) and very-high MI bins (L1 speakers: b = 0.279, 95% CI = [0.034, 0.524], p = .052; L2 speakers: b = 0.487. 95% CI = [0.181, 0.794], p = .009). Moreover, L1 speakers and L2 speakers’ processing advantages of MWSs in the medium- (L1 speakers: b = 0.364, 95% CI = [0.180, 0.547], p = .027; L2 speakers: b = 0.396, 95% CI = [0.111, 0.681], p = .038) and high-forward-delta-P (L1 speakers: b = 0.366, 95% CI = [0.111, 0.621], p = .019; L2 speakers: b = 0.239, 95% CI = [0.015, 0.463], p = .088) bins were significantly or marginally significantly greater than those in the low-forward-delta-P bin.

However, there were notable differences in association measures between L1 speakers and L2 speakers. First, the Wald tests revealed marginally significant main effects of MI (F = 4.98, p = .051), forward delta P (F = 4.18, p = .073), and backward delta P (F = 3.17, p = .096) for L1 speakers but not for L2 speakers. Second, L2 speakers showed a smaller processing advantage for MWSs in the very-high MI bin relative to in the medium- and high MI bins (medium: b = -0.487, 95% CI = [-0.794, -0.181]; high: p = .009) (b = -0.271, 95% CI = [-0.544, -0.002], p = .075), whereas, L1 speakers did not exhibit such differences. Third, the multiple comparisons indicated that L2 speakers’ processing advantage for MWSs in the very-high forward delta P bin was marginally significantly smaller than that in the medium bin (b = -0.401, 95% CI = [-0.749, -0.053], p = .053). In contrast, L1 speakers showed a significantly greater processing advantage for MWSs in the very-high forward delta P bin than in the low bin (b = 0.359, 95% CI = [0.118, 0.599], p = .015). Last, L1 speakers’ processing advantage for MWSs in the medium (b = 0.225, 95% CI = [0.081, 0.368], p = .046) backward delta P bins was significantly greater than that in the low bin, whereas L2 speakers showed only a marginally significantly greater processing advantage for MWSs in the medium bin relative to in the very-high backward delta P bin (b = 0.365, 95% CI = [0.038, 0.692], p = .054). Analyses of the interactions between statistical regularities and language background revealed a significant difference between L1 speakers and L2 speakers in terms of processing advantages of MWSs. Specifically, this difference was observed in comparisons of the processing advantages between the low- and medium-frequency bin (b = 0.232, 95% CI = [0.081, 0.382], p = .012) and the low and very-high forward delta P bin (b = -0.173, 95% CI = [-0.346, 0.000], p = .088, see Appendix S15).

For both L1 speakers and L2 speakers, collocations (L1 speakers: Hedges’s g = 0.629, 95% CI = [0.327, 0.930], p = .001; L2 speakers: Hedges’s g = 0.530, 95% CI = [0.199, 0.861], p = .013), lexical bundles (L1 speakers: Hedges’s g = 0.258, 95% CI = [0.138, 0.378], p = .025; L2 speakers: Hedges’s g = 0.308, 95% CI = [0.215, 0.401], p = .001), and phrasal verbs (L1 speakers: Hedges’s g = 3.309, 95% CI = [3.151, 3.468], p = .016; L2 speakers: Hedges’s g = 0.605, 95% CI = [0.520, 0.691], p = .046) demonstrated significant processing advantages. Moderator analyses of MWS type indicated that neither L1 speakers nor L2 speakers had a significant processing advantage for binominals (Figure 2). Subsequent multiple comparisons revealed that both L1 speakers and L2 speakers had a larger processing advantage for phrasal verbs than for idioms (L1 speakers: b = 2.927, 95% CI = [2.655, 3.199], p < .001; L2 speakers: b = 0.532, 95% CI = [0.395, 0.670], p < .001) and lexical bundles (L1 speakers: b = 3.051, 95% CI =[2.853, 3.250], p < .001; L2 speakers: b = 0.298, 95% CI =[0.171, 0.424], p = .004).

Figure 2. . Processing advantages of MWSs broken down by MWS type, task explicitness and language background. *p < .10, ^**p < .05.

Unlike L1 speakers (idiom: Hedges’s g = 0.383, 95% CI = [0.161, 0.604], p = .007), L2 speakers did not demonstrate a processing advantage for idioms, and L2 speakers’ processing advantages of collocations (b = 0.457, 95% CI = [0.109, 0.805], p = .022) and lexical bundles (b = 0.235, 95% CI = [0.092, 0.377], p = .008) were significantly greater than that of idioms. Furthermore, L1 speakers had a greater processing advantage for phrasal verbs than for collocations and binominals (collocation: b = 2.681, 95% CI = [2.341, 3.021], p = .001; binominal: b = 2.877, 95% CI = [2.481, 3.274], p < .001). Additionally, L1 speakers had a smaller (marginally significant) processing advantage for lexical bundles than for collocations (b = -0.370, 95% CI = [-0.695, -0.046], p = .077). Such differences were not significant in L2 speakers. Our analyses of the interaction between MWS type and language background revealed that L2 speakers exhibited more pronounced disadvantages in processing phrasal verbs (b = -2.612, 95% CI = [-2.737, -2.488], p < .001) relative to collocations than did L1 speakers. In contrast, L1 speakers experienced a marginally significantly larger disadvantage in processing lexical bundles relative to collocations than did L2 speakers (b = -0.123, 95% CI = [-0.237, -0.010], p = .084).

Last, regarding the moderating effect of task explicitness, both L1 speakers and L2 speakers showed significant or marginally significant processing advantages of MWSs for low- (L1 speakers: Hedges’s g = 0.419, 95% CI = [0.184, 0.653], p = .004; L2 speakers: Hedges’s g = 0.124, 95% CI =[0.017, 0.230], p = .054) and high-explicitness tasks (L1 speakers: Hedges’s g = 0.612, 95% CI = [0.328, 0.895], p < .001; L2 speakers: Hedges’s g = 0.426, 95% CI = [0.226, 0.625], p < .001). However, there was no moderating effect of task explicitness on the processing advantage of MWSs in L1 speakers (see Table 3), whereas the processing advantage of MWSs in L2 speakers significantly increased when experimental tasks were more explicit (b = 0.302, 95% CI = [0.076, 0.528], p = .020; see Table 4). The analysis examining the interaction between task explicitness and language background did not yield significant results.

Discussion

This study sought to measure the processing advantage of MWSs in comparison with freely combined word combinations. Additionally, we explored whether this advantage is influenced by statistical regularities, types of MWSs, and task explicitness. In the following section, we will summarize and discuss the findings that address these research questions, beginning by examining similar patterns identified for both L1 speakers and L2 speakers and then proceeding to highlight differences observed between them.

The Processing advantage of MWSs

Overall, results obtained in this meta-analysis support the assertion that MWSs—despite being a complex and multifaceted construct (Biber, Reference Biber2009)—offer an advantage over novel word combinations. The robustness of the processing advantage of MWSs also consolidates our belief that conventionalized multiword patterns can reduce our cognitive effort (Wray & Perkins, Reference Wray and Perkins2000) and help alleviate the time pressure placed on language users (Christiansen & Chater, Reference Christiansen and Chater2016). Furthermore, our results suggest that various types of MWSs are represented in the memory and are the essential building blocks of language. Theoretically, our findings are in line with usage-based approaches (e.g., Christiansen & Arnon, Reference Christiansen and Arnon2017; Goldberg, Reference Goldberg1995), which view language as an inventory of symbolic units of various sizes shaped by linguistic experience and predict that larger-than-word units are represented in the mental lexicon and share common cognitive mechanisms with single words.

In terms of the magnitude of effect sizes, Plonsky and Oswald (Reference Plonsky and Oswald2014) provided benchmarks for within-subject studies, indicating that an effect size of 0.6 is small, 1.0 is medium, and 1.4 is large. However, it should be noted that the benchmarks provided by Plonsky and Oswald (Reference Plonsky and Oswald2014) are suitable for effect sizes in L2 research when accuracy is used as the outcome variable, which may not be as applicable when dealing with RT measures (Avery & Marsden, Reference Avery and Marsden2019). Given that there have been relatively few studies reporting effect sizes calculated from RTs using Cohen’s d family of effect sizes, it would be difficult to conclude that the processing advantage of MWSs found in our meta-analysis (overall: Hedges’s g = 0.417; 95% CI = [0.276, 0.558], p < .001; L1 speakers: Hedges’s g = 0.475, 95% CI = [0.313, 0.636], p < .001; for L2 speakers: Hedges’s g = 0.348, 95% CI = [0.220, 0.475], p < .001) is small. After all, the relatively small effect size calculated by RTs may be due to the nature of the data because RTs have larger standard deviations relative to their means, which can decrease effect sizes calculated using Cohen’s d (Avery & Marsden, Reference Avery and Marsden2019; Brysbaert & Stevens, Reference Brysbaert and Stevens2018).