Book contents
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- The road not taken
- Introduction
- 1 The contemporary marketplace of ideas about language
- 2 Saussure
- 3 Evidence from linguistic survey research: basic description
- 4 Statistical evidence from linguistic survey research
- 5 Evidence from corpus linguistics
- 6 Speech as a complex system
- 7 Speech perception
- 8 Speech models and applications
- References
- Index
5 - Evidence from corpus linguistics
Published online by Cambridge University Press: 03 July 2009
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- The road not taken
- Introduction
- 1 The contemporary marketplace of ideas about language
- 2 Saussure
- 3 Evidence from linguistic survey research: basic description
- 4 Statistical evidence from linguistic survey research
- 5 Evidence from corpus linguistics
- 6 Speech as a complex system
- 7 Speech perception
- 8 Speech models and applications
- References
- Index
Summary
The use of computers, through their storage capacity rather than through their processing ability, begins to address Saussure's statement that “language in its totality is unknowable.” At the beginning of the computer age, processing dominated computer applications because memory was quite limited, whether in RAM or on longer-term media like tape or disk. Now, however, mass storage is much more available, so that it is possible to create tremendous corpora of language data, whether as sound files or as text files. At this writing, the use of networked storage arrays allows linguists to build linguistic corpora reaching many terabytes in size, whereas the largest storage device available to the Linguistic Atlas Project when it began to be computerized in the early 1980s was ten megabytes – growth by a factor of a million times over a quarter century. One million words of running text, as in the 1961-vintage Brown Corpus of American English and in the parallel LOB Corpus of British English, and in each of their later replications in the Freiburg-Brown (Frown) Corpus and Freiburg-LOB (FLOB) Corpus of the 1990s (all available in ICAME 1999), occupies about six or seven megabytes, massive storage in the 1960s, most of a computer's hard drive in the mid-1980s, now easily manageable on even the smallest of storage devices. Dictionaries demand the use of large corpora, in which as many different words of the language as possible may be found.
- Type
- Chapter
- Information
- The Linguistics of Speech , pp. 146 - 173Publisher: Cambridge University PressPrint publication year: 2009