Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- 10 Comparing Corpus-Driven and Corpus-Based Approaches to Diachronic Variation
- 11 Comparing Annotation Types and n-Gram Sizes
- Index
- References
10 - Comparing Corpus-Driven and Corpus-Based Approaches to Diachronic Variation
Grammatical Changes in Late Modern and Present-Day English
from Part IV - Applications of Classification-Based Approaches
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- Part IV Applications of Classification-Based Approaches
- 10 Comparing Corpus-Driven and Corpus-Based Approaches to Diachronic Variation
- 11 Comparing Annotation Types and n-Gram Sizes
- Index
- References
Summary
Focusing on grammatical changes in Late Modern and Present-Day English, the author applies a corpus-driven method to texts from two diachronic corpora, the Representative Corpus of Historical English Registers (ARCHER) and the Corpus of Historical American English (COHA). He compares his findings to those returned by more conventional corpus-based methods, which can be characterized as hypothesis-driven. To this purpose, the study employs automated profiling of large feature sets, such as word- and POS-based mono-, bi- and trigrams, chunks, syntactic dependency labels and measures of constituent order and length. The derived feature profiles are combined in a supervised classification task with a given division of texts into earlier and later corpus subperiods to reveal patterns of over- and underuse. Structures that profiled as over- or under-represented in the diachronic subsections are then browsed for grammatical changes that may have been missed by previous research. According to the author, an advantage of such approaches is that they are theory-neutral and may generate novel hypotheses for investigation. These may then serve as input to further corpus-based approaches.
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 291 - 322Publisher: Cambridge University PressPrint publication year: 2022