Comparing Baselines for Corpus Analysis

doi:10.1017/9781108589314.005

4 - Comparing Baselines for Corpus Analysis

Research into the Get-Passive in Speech and Writing

from Part II - Selection, Calibration and Preparation of Corpus Data

Published online by Cambridge University Press: 06 May 2022

Sean Wallis and

Seth Mehl

Edited by

Ole Schützler and

Julia Schlüter

Show author details

Ole Schützler: Affiliation:
Universität Leipzig
Julia Schlüter: Affiliation:
Universität Bamberg

Book contents

Get access

Summary

The authors review different baselines for the study of alternant choices, emphasizing that normalization to a standard number of words – while straightforward in its application – will in many cases not provide a meaningful measure of frequency. Instead, it is argued, we need a baseline indicating opportunities of use, such as phrase or sentence counts. Exemplifying their proposal with reference to get- and be-passives and the presence or absence of agentive by-phrases, the authors demonstrate a sequence of measures taken to make the quantities that are compared more meaningful and defensible, based on linguistically informed selections of baseline quantities (number of main verbs, passives or potentially alternating passives). Crucially, this process must involve a categorization of observations by the researcher to ensure that mutual substitution is plausible in each case. To calibrate this manual data verification exercise to a manageable level, the authors apply a method of uneven category sub-sampling to the data, and use it to adjust variance estimates and confidence intervals in their analysis.

Keywords

Wilson score interval subsampling baseline get-passive be-passive Fuzzy Tree Fragment (FTF)ICE-GB

Type: Chapter
Information: Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 101 - 126

DOI: https://doi.org/10.1017/9781108589314.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ball, Catherine. 1994. Automated Text Analysis: Cautionary Tales. Literary and Linguistic Computing 9(4). 295–302.CrossRef Google Scholar

Banks, David. 1994. Writ in Water: Aspects of the Scientific Journal Article. Brest: Erla, Université de Bretagne Occidentale.Google Scholar

Barber, Charles. 1962. Some Measurable Characteristics of Modern Scientific Prose. In Behre, Frank, ed. Contributions to English Syntax and Philology. Stockholm: Almqvist and Wiksell. 21–43.Google Scholar

Biber, Douglas, Finegan, Edward, Johannson, Stig, Conrad, Susan and Leech, Geoffrey. 1999. Longman Grammar of Spoken and Written English. London: Longman.Google Scholar

Downing, Angela. 1996. The Semantics of Get-Passives. In Hasan, Ruqaiya, Cloran, Carmel and Butt, David G., eds. Functional Descriptions. Amsterdam: John Benjamins. 179–207.Google Scholar

Evison, Jane. 2010. What Are the Basics of Analysing a Corpus? In O’Keeffe, Anne and McCarthy, Michael, eds. The Routledge Handbook of Corpus Linguistics. London: Routledge. 122–35.Google Scholar

Fleisher, Nicholas. 2006. The Origin of Passive Get. English Language and Linguistics 10(2). 225–52.Google Scholar

Greenbaum, Sidney. 1996. Introducing ICE. In Greenbaum, Sidney, ed. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press. 3–12.Google Scholar

Hatcher, Anna Granville. 1949. To Get/Be Invited. Modern Language Notes 64(7). 433–46.Google Scholar

Huddleston, Rodney, and Pullum, Geoffrey K. 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press.Google Scholar

Huddleston, Rodney, and Pullum, Geoffrey K.. 2005. A Student’s Introduction to English Grammar. Cambridge: Cambridge University Press.Google Scholar

Hundt, Marianne. 2009. How Often to Things Get V-ed in Philippine and Singapore English? A Case Study of the Get-Passive in Two Outer-Circle Varieties of English. In Bowen, Rhonwen, Mobarg, Mats and Ohlander, Solve, eds. Corpora and Discourse – and Stuff: Papers in Honor of Karin Aijmer. Gothenburg Studies in English 96. 121–31.Google Scholar

Jespersen, Otto. 1949. A Modern English Grammar on Historical Principles. Part 7. Copenhagen: E. Munksgaard.Google Scholar

Lavandera, Beatriz. 1978. Where Does the Sociolinguistic Variable Stop? Language in Society 7. 171–83.Google Scholar

Lakoff, Robin. 1971. Passive Resistance. Papers from the Regional Meeting of the Chicago Linguistic Society 7. 149–62.Google Scholar

Lindquist, Hans. 2009. Corpus Linguistics and the Description of English. Edinburgh: Edinburgh University Press.Google Scholar

McEnery, Tony, and Wilson, Andrew. 2001. Corpus Linguistics. 2nd ed. Edinburgh: Edinburgh University Press.Google Scholar

McEnery, Tony, Xiao, Richard and Tono, Yukio. 2006. Corpus-Based Language Studies: An Advanced Resource Book. New York: Routledge.Google Scholar

Mehl, Seth. 2018. What We Talk about When We Talk about Corpus Frequency: The Example of Polysemous Verbs with Light and Concrete Senses. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2017-0039.Google Scholar

Mehl, Seth. 2019. Mapping Lexical Co-occurrence Statistics against a Part of Speech Baseline. In Parviainen, Hannah, Kaunisto, Mark and Pahta, Päivi, eds. Corpus Approaches into World Englishes and Language Contrasts. Helsinki: eVarieng. https://varieng.helsinki.fi/series/volumes/20/mehl/ (accessed 27 March 2021).Google Scholar

Nelson, Gerald, Aarts, Bas and Wallis, Sean. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins.Google Scholar

Newcombe, Robert. 1998. Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine 17. 857–72.Google Scholar

Schegloff, Emanuel a. 1993. Reflections on Quantification in the Study of Conversation. Research on Language and Social Interaction 26(1). 99–128.Google Scholar

Smith, Nicholas, and Leech, Geoffrey. 2013. Verb Structures in Twentieth-Century British English. In Aarts, Bas, Close, Joanne, Leech, Geoffrey and Wallis, Sean, eds. The Verb Phrase in English: Investigating Recent Language Change with Corpora. Cambridge: Cambridge University Press. 68–98.Google Scholar

Toyota, Junichi. 2008. Diachronic Change in the English Passive. Basingstoke: Palgrave MacMillan.Google Scholar

Wallis, Sean. 2012a. That Vexed Problem of Choice. London: UCL Survey of English Usage. www.ucl.ac.uk/english-usage/statspapers/vexedchoice.pdf (accessed 27 March 2021).Google Scholar

Wallis, Sean. 2012b. Freedom to Vary and Significance Tests. London: UCL Survey of English Usage. http://corplingstats.wordpress.com/2012/09/30/free-to-vary (accessed 27 March 2021).Google Scholar

Wallis, Sean. 2013. Binomial Confidence Intervals and Contingency Tests. Journal of Quantitative Linguistics 20(3). 178–208.Google Scholar

Wallis, Sean. 2019. Comparing χ² Tables for Separability of Distribution and Effect. Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes. Journal of Quantitative Linguistics 26(4). 330–55.Google Scholar

Wallis, Sean. 2021. Statistics in Corpus Linguistics: A New Approach. New York: Routledge.Google Scholar

Wilson, Edwin Bidwell. 1927. Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association 22(158). 209–12.Google Scholar

Book contents

4 - Comparing Baselines for Corpus Analysis

Summary

Keywords

Access options

References

Further Reading

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive