Introduction
Over the last few decades, corpus-linguistic methods have established themselves as among the most powerful and versatile tools to study language acquisition, processing, variation, and change. This development has been driven in particular by the following considerations:
a. technological progress (e.g., processor speeds as well as hard drive and RAM sizes);
b. methodological progress (e.g., the development of software tools, programming languages, and statistical methods);
c. a growing desire by many linguists for (more) objective, quantifiable, and replicable findings as an alternative to, or at least as an addition to, intuitive acceptability judgments (see Chapter 3);
d. theoretical developments such as the growing interest in cognitively and psycholinguistically motivated approaches to language in which frequency of (co-)occurrence plays an important role for language acquisition, processing, use, and change.
In this chapter, we will discuss a necessarily small selection of issues regarding (i) the creation, or compilation, of new corpora and (ii) the use of corpora once they have been compiled. Although this chapter encompasses both the creation and use of corpora, there is no expectation that any individual researcher would be engaged in both these kinds of activities. Different skills are called for when it comes to creating and using corpora, a point noted by Sinclair (2005: 1), who draws attention to the potential pitfalls of a corpus analyst building a corpus, specifically, the danger that the corpus will be constructed in a way that can only serve to confirm the analyst’s pre-existing expectations. Some of the issues addressed in this chapter are also dealt with in Wynne (2005), McEnery, Xiao, and Tono (2006), and McEnery and Hardie (2012) in a fairly succinct way, and more thoroughly in Lüdeling and Kytö (2008a, 2008b) and Beal, Corrigan, and Moisl (2007a, 2007b).