Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-nr4z6 Total loading time: 0 Render date: 2024-05-04T14:32:37.938Z Has data issue: false hasContentIssue false

8 - Using web-based data for the study of global English

Published online by Cambridge University Press:  05 June 2014

Marianne Hundt
Affiliation:
English Department, University of Zurich, Switzerland
Manfred Krug
Affiliation:
Otto-Friedrich-Universität Bamberg, Germany
Julia Schlüter
Affiliation:
Otto-Friedrich-Universität Bamberg, Germany
Get access

Summary

Introduction

According to Biber et al. (1998: 4), a corpus is ‘a large and principled collection of natural texts’ (my emphasis). This definition of a corpus obviously does not apply to the huge collection of texts that the World Wide Web constitutes, and in the more narrow corpus linguistic terms, the web can therefore not be considered a corpus. However, the data available on the web have been used increasingly in corpus linguistic investigations. The focus of this chapter will be on why this is the case, how this can be done, as well as the gains and limitations of using web-based data for linguistic research.

There are several reasons why linguists have turned to the World Wide Web as a source of data. For the study of some phenomena, even large corpora comprising 100 million words or more are still not large enough. This holds for most kinds of lexicographic research, but investigating some of the more ephemeral points in English grammar may also necessitate larger sources of data. In addition, the internet has given rise to new text types such as e-mail, chat-room discussions, text messaging, blogs, or interactive internet magazines – text types that are interesting objects of study in themselves (e.g. Herring and Paolillo 2006; Tagliamonte 2008). Another reason for the allure of the World Wide Web is that it takes a long time and considerable financial resources to compile standard reference corpora. Moreover, these representative corpora are quickly out of date when it comes to recent or ongoing change; Baker (2009) describes how the internet can be used to supplement existing standard corpora in this respect. Furthermore, apart from the International Corpus of English (ICE), corpus linguistics has largely focused on so-called inner-circle varieties of English, i.e. varieties of English as a first language; moreover, within the inner circle, the focus has been mostly on British (BrE) and American English (AmE). For even slightly more exotic varieties of English – like Bangladeshi or Pakistani English – we do not even have ICE components and are very unlikely to see them in the (near) future. The discussion in this chapter also applies in large parts to the recently made available Corpus of Global Web-Based English (GloWbE) (see corpus2.byu.edu/glowbe), a web-derived corpus of world Englishes.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Hoffmann, Sebastian 2007b. ‘Processing Internet-derived text: creating a corpus of Usenet messages’, Literary and Linguistic Computing 22(2): 151–165.CrossRefGoogle Scholar
Hundt, Marianne, Nesselhauf, Nadja and Biewer, Carolin (eds.) 2007. Corpus linguistics and the Web. Amsterdam: Rodopi.CrossRefGoogle Scholar
Volk, Martin 2001. ‘Exploiting the WWW as a corpus to resolve PP attachment ambiguities’, in Rayson, Paul, Wilson, Andrew, McEnery, Tony, Hardie, Andrew and Khoja, Shereen (eds.), Proceedings of the Corpus Linguistics 2001 conference. Lancaster, 30 March – 2 April 2001. Department of Linguistics. No pagination.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×