Perhaps the greatest challenge to creating a research timeline on teaching and learning collocation is deciding how wide to cast the net in the search for relevant publications. For one thing, the term ‘collocation’ does not have the same meaning for all (applied) linguists and practitioners (Barfield & Gyllstad 2009) (see timeline). For another, items that are labelled as collocations in one study may be called something else in another study (Wray 2000: 465).
In the discipline of corpus linguistics, collocation refers to the above-chance co-occurrence of two words (Sinclair 1991). The degree of likelihood of two words co-occurring in a corpus within a given span of discourse can be quantified through one of the available measures of collocational strength such as the mutual information (MI) score. The higher that score, the stronger the word partnership or collocation. Word substitutions that cause deviations from the regular co-occurrences (e.g. highly religious instead of deeply religious) will tend to stand out as unconventional or ‘non-idiomatic’ (where the term ‘idiomatic’ is used in the sense of ‘combining words like a native speaker would’).
However, in the older discipline of phraseology research, collocations are usually considered a particular type of multiword expression, distinguishable from other types, most notably idioms (e.g. Howarth 1998; Gitsaki 1999: 3). The principal argument for making this distinction is that the meaning of some multiword expressions (e.g. cause damage) follows from adding up the meaning of their constituents, while the meaning of other multiword expressions (e.g. pull strings) transcends that of their constituent words. The former type is then labelled ‘collocation’ and the latter is labelled ‘idiom’. This commonly made distinction between collocations and idioms is paralleled in the realm of language education by the availability of study materials devoted separately to either collocations or idioms (e.g. McCarthy & O'Dell 2002, 2005).
The distinction between collocations and idioms on the basis of semantic transparency (or ‘compositionality’) is not black-and-white, however. For one thing, many so-called collocations are transparent only provided one is not led astray by the primary meaning of constituent words (e.g. pay in pay attention is not used in its financial transaction sense) (Boers & Webb 2015). For another, many expressions that are listed in idiom dictionaries are to some degree compositional. If pull strings evokes the image of a puppeteer in action, and if this aids interpretation of the expression, then the constituent words pull and strings do contribute to the meaning of the phrase as a whole (Gibbs 1994).
Using the above-chance co-occurrence of words as a (corpus-based) criterion naturally leads to the inclusion of expressions considered idioms in phraseological tradition. For example, some of the target expressions labelled collocations in Webb, Newton & Chang's (2013) study (see timeline) are included in the Collins Cobuild Dictionary of Idioms (2002) (e.g. cut corners and stay the course) while other targets are not (e.g. buy time and run the risk). Conversely, given their relatively fixed nature, most idioms will conform to the corpus linguistic definition of collocation (e.g. vicious circle) (Macis & Schmitt 2017). We could therefore have cast our net as wide as to include publications with an explicit focus on idioms in second language (L2) learning. However, to keep the scope of this research timeline manageable, we have opted not to do that. The body of research on idiom comprehension and learning is large, and probably merits a research timeline of its own.
Apart from revealing the statistical likelihood that certain words will occur in each other's company (e.g. that pretty is much more likely to co-occur with girl than with boy), corpus data can also be used to make inventories of continuous strings of two or more words (n-grams) that meet a given frequency criterion. Such highly frequent strings have been called lexical bundles (Biber, Conrad & Cortes 2004). The resulting inventories will contain sequences such as and so on, and one of the, which consist of words that are so common that likelihood-of-co-occurrence statistics (e.g. MI scores) will often fail to reach significance (owing to the fact that these words are found in the company of just about any other word in a corpus). Despite the value in this line of research, we have also excluded publications with a particular focus on lexical bundles. Among these are several corpus-informed attempts to create inventories of uninterrupted word sequences that could be given priority in learning by virtue of their high frequency (Shin & Nation 2008; Simpson-Vlach & Ellis 2010; Liu 2012; Martinez & Schmitt 2012).
The phenomenon of collocation is of course part and parcel of formulaic language in general. A fair number of studies have explored the learning and teaching of ‘formulaic sequences’ (Wray 2000), encompassing diverse multiword expressions, often identified or selected by the researchers on the basis of intuition (and inter-coder agreement) instead of corpus data. We have also decided against including this line of research in our timeline, because a separate timeline devoted to formulaic language is in fact already available in the present journal (Wray 2013).
Still, we fully recognize that giving precedence in our research timeline to studies which explicitly focus on ‘collocation’ is at the expense of multiple other publications that offer valuable insights into the nature of phraseology more generally and into the challenges that particular types of multiword expressions (e.g. idioms) pose for L2 learners.
Turning now to our timeline, it is striking that interest in collocation in the context of L2 learning initially developed very slowly. The pace of research only began to pick up in the late 1990s, possibly spurred on by Nattinger & DeCarrico's (1992) and Lewis's (1993, 1997, 2000) seminal works that highlighted the relevance of multiword lexis for L2 learners. The proliferation of research on collocation learning and teaching since the late 1990s has been astounding, however, with a particularly rapid rise in numbers of studies in the past decade. There is no doubt that the interval between the creation of this timeline and its publication will see more publications on the subject. As a whole, the timeline shows a progression in research from studies that provide evidence of the importance of collocation for L2 learners and the slow pace of L2 collocation learning in the absence of pedagogic intervention, to studies that evaluate the effectiveness of various types of intervention, ranging from relatively unobtrusive manipulations of input (e.g. textual enhancement) to explicit collocation-focused exercises.
The publications included in this timeline cover the following three broad themes, and each publication is classified according to the most relevant one(s).
A Demonstrating the usefulness of L2 collocation knowledge. These are publications that show strong associations between learners’ mastery of collocation and their general levels of (speaking and/or writing) proficiency.
B Assessing L2 learners’ collocation knowledge. This theme includes comparisons of natives’ and learners’ use of collocation, and also the development and validation of test instruments to measure collocation knowledge.
C Investigating factors that influence the pace of acquisition of (types of) collocations, and pedagogic interventions to accelerate learning. This broad category comprises studies which gauge the impact of variables such as first language (L1)-L2 (non-)congruency and frequency of encounters on learners’ (incidental) uptake of L2 collocations, as well as studies that evaluate the effectiveness of collocation-focused instructional procedures.
Biber, D., Conrad, S. & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics
Boers, F. & Webb, S. (2015). Gauging the semantic transparency of idioms: Do natives and learners see eye to eye? In Heredia, R. & Cieslicka, A. (eds.), Bilingual figurative language processing. Cambridge University Press, 368–392.
Collins Cobuild dictionary of idioms
(2002, 2nd edn.). Glasgow: HarperCollins.
Gibbs, R. W. (1994). The poetics of mind: Figurative thought, language and understanding. Cambridge: Cambridge University Press.
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of collocational knowledge. San Francisco: International Scholars Publications.
Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics
Lewis, M. (1993). The lexical approach. Hove: Language Teaching Publications.
Lewis, M. (1997). Implementing the lexical approach. Hove: Language Teaching Publications.
Lewis, M. (ed.) (2000). Teaching collocations. Hove: Language Teaching Publications.
Liu, D. (2012). The most frequently-used multiword constructions in academic written English: A multi-corpus study. English for Specific Purposes
Macis, M. & Schmitt, N. (2017). Not just ‘small potatoes’: Knowledge of the idiomatic meanings of collocations, Language Teaching Research
Martinez, R. & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics
McCarthy, M. & O'Dell, F. (2002). English idioms in use. Cambridge: Cambridge University Press.
McCarthy, M. & O'Dell, F. (2005). English collocations in use: Intermediate. Cambridge: Cambridge University Press.
Nattinger, J. R. & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press.
Shin, D. & Nation, P. (2008). Beyond single words: The most frequent collocations in spoken English. ELT Journal
Simpson-Vlach, R. & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Wray, A. (2000). Formulaic sequences in second language teaching: Principles and practice. Applied Linguistics
Wray, A. (2013). Formulaic language. Language Teaching
Frank Boers is an Associate Professor at the School of Linguistics and Applied Language Studies of Victoria University of Wellington. His initial research endeavours concerned lexicology, semantics and rhetoric (e.g. studies of metaphor). Most of his more recent research interests, however, were sparked by his experience as a language teacher and teacher trainer. He now publishes mostly on matters of instructed second language acquisition, often regarding vocabulary and phraseology. Some of the latter work has appeared in journals such as Applied Linguistics and Language Teaching Research.
Stuart Webb is a Professor of Applied Linguistics in the Faculty of Education at the University of Western Ontario. His research interests include vocabulary studies, extensive reading and listening, and language learning through watching television. His articles have been published in journals such as Applied Linguistics and Language Learning. His latest book (with Paul Nation) is How vocabulary is learned (Oxford University Press, 2017).