Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-m8qmq Total loading time: 0 Render date: 2024-04-25T05:57:29.325Z Has data issue: false hasContentIssue false

Part I - Corpus Dimensions and the Viability of Methodological Approaches

Published online by Cambridge University Press:  06 May 2022

Ole Schützler
Affiliation:
Universität Leipzig
Julia Schlüter
Affiliation:
Universität Bamberg
Get access
Type
Chapter
Information
Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 15 - 72
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Further Reading

Cochran, William G. 1983. Planning and Analysis of Observational Studies. New York: John Wiley & Sons. Chapters 1, 2, 3, 4, 6.CrossRefGoogle Scholar
Greenland, Sander, Mansourina, Mohammad Ali and Altman, Douglas G.. 2016. Sparse Data Bias: A Problem Hiding in Plain Sight. British Medical Journal 352(i1982). https://doi.org/10.1136/bmj.i1981.Google Scholar
Johnson, Daniel E. 2014. Progress in Regression: Why Natural Language Data Calls for Mixed-Effects Models. Unpublished manuscript. www.danielezrajohnson.com/johnson_2014b.pdf.Google Scholar
Winter, Bodo. 2020. Statistics for Linguistics. New York: Routledge. Chapters 14 and 15.Google Scholar

References

Algeo, John. 2006. British or American English? A Handbook of Word and Grammar Patterns. Cambridge: Cambridge University Press.Google Scholar
Barth, Danielle, and Kapatsinski, Vsevolod. 2018. Evaluating Logistic Mixed-Effects Models of Corpus-Linguistic Data in Light of Lexical Diffusion. In Speelman, Dirk, Heylen, Kris and Geeraerts, Dirk, eds. Mixed-Effects Regression Models in Linguistics. New York: Springer. 99116.CrossRefGoogle Scholar
Biber, Douglas. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Biber, Douglas, and Gray, Bethany. 2013. Being Specific about Historical Change: The Influence of Sub-Register. Journal of English Linguistics 41(2). 104–34. http://eng.sagepub.com/cgi/doi/10.1177/0075424212472509.Google Scholar
Burnard, Lou, ed. 2007. Reference Guide for the British National Corpus (XML edition). British National Corpus Consortium & Research Technologies Service at Oxford University Computing Services. www.natcorp.ox.ac.uk/docs/URG/BNCdes.html.Google Scholar
Cruttenden, Alan. 2014. Gimson’s Pronunciation of English. 8th ed. London: Arnold.Google Scholar
Davies, Mark. 2008–. The Corpus of Contemporary American English (COCA): 600 Million Words, 1990–Present. www.english-corpora.org/coca.Google Scholar
Desgraupes, Bernard, and Loiseau, Sylvain. 2018. rcqp: Interface to the Corpus Query Protocol. R package version 0.5. https://CRAN.R-project.org/package=rcqp.Google Scholar
Elwert, Felix, and Winship, Christopher. 2014. Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annual Review of Sociology 40. 3153. https://doi.org/10.1146/annurev-soc-071913-043455.CrossRefGoogle ScholarPubMed
Firth, David. 1993. Bias Reduction of Maximum Likelihood Estimates. Biometrika 80(1). 2738. https://doi.org/10.2307/2336755.Google Scholar
Gelman, Andrew, and Greenland, Sander. 2019. Are Confidence Intervals Better Termed ‘Uncertainty Intervals’? British Medical Journal 366(l5381). https://doi.org/10.1136/bmj.l5381.Google ScholarPubMed
Greenland, Sander, Mansourina, Mohammad Ali and Altman, Douglas G. 2016. Sparse Data Bias: A Problem Hiding in Plain Sight. British Medical Journal 352 (i1982). https://doi.org/10.1136/bmj.i1981.Google Scholar
Hiltunen, Turo, McVeigh, Joe and Säily, Tanja. 2017. How to Turn Linguistic Data into Evidence? In Hiltunen, Turo, McVeigh, Joe and Säily, Tanja, eds. Big and Rich Data in English Corpus Linguistics: Methods and Explorations. Studies in Variation, Contacts and Change in English 19. www.helsinki.fi/varieng/series/volumes/19/introduction.html.Google Scholar
Johnson, Daniel E. 2014. Progress in Regression: Why Natural Language Data Calls for Mixed-Effects Models. Unpublished manuscript. www.danielezrajohnson.com/johnson_2014b.pdf.Google Scholar
Jones, Daniel. 2011. English Pronouncing Dictionary (EPD). Edited by Roach, Peter, Setter, Jane and Esling, John. 18th ed. Cambridge: Cambridge University Press. CD-ROM edition.Google Scholar
Koplenig, Alexander. 2017. The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets: Reconstructing the Composition of the German Corpus in Times of WWII. Digital Scholarship in the Humanities 21(1). 169–88. https://doi.org/10.1093/llc/fqv037.Google Scholar
Lass, Roger, and Laing, Margaret. 2010. In Celebration of Early Middle English ‘H’. Neuphilologische Mitteilungen 111(3). 345–54.Google Scholar
Michel, Jean-Baptiste, Shen, Yuan Kui, Aiden, Aviva Presser et al. 2010. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014). 176–82. https://doi.org/10.1126/science.1199644.Google Scholar
Minkova, Donka, ed. 2009. Phonological Weakness in English: From Old to Present-Day English. Basingstoke and New York: Palgrave Macmillan.Google Scholar
Minkova, Donka. 2014. A Historical Phonology of English. Edinburgh: Edinburgh University Press.Google Scholar
OED (Oxford English Dictionary Online). 2000–. Oxford: Oxford University Press. http://dictionary.oed.com/ (accessed 3 March 2020).Google Scholar
Pechenick, Eitan, Danforth, Christopher M. and Dodds, Peter Sheridan. 2015. Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. PLoS ONE 10(10), e0137041. https://doi.org/10.1371/journal.pone.0137041.CrossRefGoogle ScholarPubMed
Peters, Pam. 2004. The Cambridge Guide to English Usage. Cambridge: Cambridge University Press.Google Scholar
R Core Team. 2019. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org/.Google Scholar
Scherer, Ralph. 2018. PropCIs: Various Confidence Interval Methods for Proportions. R package version 0.3–0. https://CRAN.R-project.org/package=PropCIs.Google Scholar
Schlüter, Julia. 2019. Tracing the (Re-)Emergence of /h/ and /j/ through 350 Years of Books: Mergers and Merger Reversals at the Interface of Phonetics and Phonology. Folia Linguistica 40(s1). Special issue on diachronic phonotactics. Edited by Nikolaus Ritt, Andreas Baumann and Christina Prömer. 177202. https://doi.org/10.1515/flih-2019-0009.Google Scholar
Schlüter, Julia, and Vetter, Fabian. 2020. An Interactive Visualization of Google Books Ngrams with R and Shiny: Exploring a(n) Historical Increase in Onset Strength in a(n) Huge Database. Journal of Data Mining and Digital Humanities 21. Special issue on visualizations in historical linguistics. Edited by Benjamin Molineaux, Bettelou Los and Martti Mäkinen. https://jdmdh.episciences.org/7000.Google Scholar
Speelman, Dirk, Heylen, Kris and Geeraerts, Dirk, eds. 2018. Mixed-Effects Regression Models in Linguistics. New York: Springer.CrossRefGoogle Scholar
Steel, E. Ashley, Liermann, Martin and Guttorp, Peter. 2019. Beyond Calculations: A Course in Statistical Thinking. The American Statistician 73. 392401. https://doi.org/10.1080/00031305.2018.1505657.Google Scholar
Wells, John. 2008. Longman Pronunciation Dictionary (LPD). 3rd ed. Harlow: Pearson Longman. CD-ROM edition: Longman Pronunciation Coach.Google Scholar
Winter, Bodo. 2020. Statistics for Linguistics. New York: Routledge.Google Scholar
Winter, Bodo, and Grice, Martine. 2021. Independence and Generalizability in Linguistics. Linguistics 59(5). 1251–77.Google Scholar

Further Reading

Cole, Jennifer, and Shattuck-Hufnagel., Stefanie 2016. New Methods for Prosodic Transcription: Capturing Variability as a Source of Information. Laboratory Phonology 7(1). 8. https://doi.org/10.5334/labphon.29.Google Scholar
Durand, Jacques. 2017. Corpus Phonology. In Oxford Research Encyclopedia of Linguistics. https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-145.Google Scholar
Hoffmann, Sebastian, and Arndt-Lappe., Sabine 2021. Better Data for More Researchers: Using the Audio Features of BNCweb. ICAME Journal 45. 125–54.CrossRefGoogle Scholar
Schlüter, Julia. 2015. Rhythmic Influence on Grammar: Scope and Limitations. In Vogel, Ralf and van de Vijver, Ruben, eds. Rhythm in Cognition and Grammar: A Germanic Perspective. (Trends in Linguistics. Studies and Monographs 286). Berlin/New York: de Gruyter Mouton. 179205.Google Scholar
Shih, Stephanie S. 2017. Phonological Influences in Syntactic Alternations. In Gribanova, Vera and Shih, Stephanie S., eds. The Morphosyntax-Phonology Connection. Oxford: Oxford University Press. 223–52.Google Scholar

References

Arndt-Lappe, Sabine, and Ernestus, Mirjam. 2020. Morphology–Phonology Interaction: The Role of Lexical Storage. In Pirelli, Vito, Plag, Ingo and Dressler, Wolfgang U., eds. Word Knowledge and Word Usage: A Cross-disciplinary Guide to the Mental Lexicon. Berlin/New York: de Gruyter Mouton. 191227. www.degruyter.com/document/doi/10.1515/9783110440577-006/html.CrossRefGoogle Scholar
Arnon, Inbal, and Snider, Neal. 2010. More Than Words: Frequency Effects for Multi-word Phrases. Journal of Memory and Language 62(1). 6782.Google Scholar
Azzabou-Kacem, Soundess. 2018. Stress Shift in English Rhythm Rule Environments: Effects of Prosodic Boundary Strength and Stress Clash Types. Doctoral thesis. Edinburgh: University of Edinburgh.Google Scholar
Baayen, Harald R. 2008. Analyzing Linguistic Data. Cambridge: Cambridge University Press.Google Scholar
Blass, Anne-Katrin. 2020. The Big Mess Construction: Forms and Functions in Present-Day English. Doctoral dissertation. Trier: Trier University. https://ubt.opus.hbz-nrw.de/opus45-ubtr/frontdoor/index/index/docId/1427.Google Scholar
BNC Consortium, Bodleian Libraries, Oxford. 2007. The British National Corpus, version 3 (BNC XML Edition). www.natcorp.ox.ac.uk.Google Scholar
Boersma, Paul, and Weenink, David J. M.. 2021. Praat: Doing Phonetics by Computer, version 6.1.40. Computer program. www.praat.org.Google Scholar
Bolinger, Dwight. 1958. A Theory of Pitch Accent in English. Word 14(2–3). 119–49.CrossRefGoogle Scholar
Bolinger, Dwight. 1965. Forms of English: Accent, Morpheme, Order. Cambridge, MA: Harvard University Press.Google Scholar
Bolinger, Dwight. 1981. Two Kinds of Vowels, Two Kinds of Rhythm. Bloomington, IN: Indiana University Linguistics Club.Google Scholar
Breiss, Canaan, and Hayes, Bruce. 2020. Phonological Markedness Effects in Sentence Formation. Language 96(2). 33870.Google Scholar
Burnard, Lou, ed. 2007. Reference Guide for the British National Corpus (XML Edition). www.natcorp.ox.ac.uk/docs/URG.Google Scholar
Bybee, Joan. 1999. Usage-Based Phonology. In Darnell, Michael, Moravcsik, Edith A., Noonan, Michael Newmeyer, Frederick J. and Wheatley, Kathleen, eds. Functionalism and Formalism in Linguistics, vol. 1, General Papers. Amsterdam: John Benjamins. 21142.CrossRefGoogle Scholar
Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press.Google Scholar
Bybee, Joan, and Beckner, Clay. 2015. Usage-Based Theory. In Heine, Bernd and Narrog, Heiko, eds. The Oxford Handbook of Linguistic Analysis. 2nd ed. Oxford: Oxford University Press. 953–79.Google Scholar
Coleman, John, Baghai-Ravary, Ladan, Pybus, John and Grau, Sergio. 2012. Audio BNC: The Audio Edition of the Spoken British National Corpus. Oxford: Phonetics Laboratory, University of Oxford.Google Scholar
Coleman, John, Liberman, Mark Y., Kochanski, Greg, Burnard, Lou and Yuan, Jiahong. 2011. Mining a Year of Speech. Paper Presented at VLSP 2011: New Tools and Methods for Very-Large-Scale Phonetics Research. University of Pennsylvania, 2931 January 2011.Google Scholar
Davies, Mark. 2008–. The Corpus of Contemporary American English: 450 Million Words, 1990–Present. www.english-corpora.org/coca.Google Scholar
Grabe, Ester, and Warren, Paul. 1995. Stress Shift: Do Speakers Do It or Do Listeners Hear It? In Connell, Bruce and Arvaniti, Amalia, eds. Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV. Cambridge: Cambridge University Press. 95110.Google Scholar
Gussenhoven, Carlos. 1991. The English Rhythm Rule as an Accent Deletion Rule. Phonology 8(1). 135.Google Scholar
Hammond, Michael. 2014. Phonological Complexity and Input Optimization. Phonological Studies 17. 8594.Google Scholar
Hammond, Michael. 2016. Input Optimisation: Phonology and Morphology. Phonology 33(3). 459491.CrossRefGoogle Scholar
HarrellJr, Frank E. 2020. rms: Regression Modeling Strategies. Manual. https://CRAN.R-project.org/package=rms.Google Scholar
Hoffmann, Sebastian, and Arndt-Lappe., Sabine 2021. Better Data for More Researchers: Using the Audio Features of BNCweb. ICAME Journal 45. 125–54.Google Scholar
Hoffmann, Sebastian, and Evert, Stefan. 2006. BNCweb (CQP-edition): The Marriage of Two Corpus Tools. In Braun, Sabine, Kohn, Kurt and Mukherjee, Joybrato, eds. Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. (English Corpus Linguistics 3). Frankfurt am Main: Peter Lang. 17795.Google Scholar
Hoffmann, Sebastian, Evert, Stefan, Smith, Nicholas, Lee, David and Prytz, Ylva Berglund. 2008. Corpus Linguistics with BNCweb. A Practical Guide. Frankfurt am Main: Peter Lang.Google Scholar
Keating, Patricia, and Shattuck-Hufnagel., Stephanie 2002. A Prosodic View of Word Form Encoding for Speech Production. UCLA Working Papers in Phonetics 101. 112–56.Google Scholar
Kentner, Gerrit. 2015. Stress Clash Hampers Processing of Noncanonical Structures in Reading. In Vogel, Ralf and van de Vijver, Ruben, eds. Rhythm in Cognition and Grammar: A Germanic Perspective. Trends in Linguistics. Studies and Monographs (TiLSM) 286. Berlin: de Gruyter Mouton. 111–35.Google Scholar
Leech, Geoffrey, Garside, Roger and Bryant, Michael. 1994. CLAWS4: The Tagging of the British National Corpus. Proceedings of the 15th International Conference on Computational Linguistics (COLING 94). Kyoto, Japan. 6228.CrossRefGoogle Scholar
Hans-Martin, Lehmann, Schneider, Peter and Hoffmann, Sebastian. 2000. BNCweb. In Kirk, John, ed. Corpora Galore: Analysis and Techniques in Describing English. Amsterdam: Rodopi. 259–66.Google Scholar
Levelt, Willem J. M. 1989. Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.Google Scholar
Levelt, Willem J., Roelofs, Ardi and Meyer, Antje S.. 1999. A Theory of Lexical Access in Speech Production. The Behavioral and Brain Sciences 22(1). 138; discussion 3875.Google Scholar
Liberman, Mark Y., and Prince, Alan. 1977. On Stress and Linguistic Rhythm. Linguistic Inquiry 8. 249336.Google Scholar
Lohmann, Arne. 2014. English Co-ordinate Constructions. A Processing Perspective on Constituent Order. Cambridge: Cambridge University Press.Google Scholar
Nespor, Marina, and Vogel, Irene. 1986. Prosodic Phonology. Dordrecht: Foris.Google Scholar
Pawley, Andrew, and Syder, Francis H.. 1983. Two Puzzles for Linguistic Theory: Nativelike Selection and Nativelike Fluency. In Richards, Jack C. and Schmidt, Richard W., eds. Language and Communication. London: Longman. 191226.Google Scholar
Pitt, Mark, Dilley, Laura C., Johnson, Keith et al. 2007. Buckeye Corpus of Conversational Speech. Second release. Columbus, OH: Department of Psychology, Ohio State University.Google Scholar
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria. www.R-project.org.Google Scholar
Schlüter, Julia. 2005. Rhythmic Grammar: The Influence of Rhythm on Grammatical Variation and Change in English. Berlin/New York: de Gruyter Mouton.Google Scholar
Schlüter, Julia. 2015. Rhythmic Influence on Grammar: Scope and Limitations. In Vogel, Ralf and van de Vijver, Ruben, eds. Rhythm in Cognition and Grammar: A Germanic Perspective. (TiLSM 286). Berlin: de Gruyter Mouton. 179205.Google Scholar
Schlüter, Julia, and Knappe, Gabriele. 2018. Synonym Selection as a Strategy of Stress Clash Avoidance. In Hoffmann, Sebastian, Sand, Andrea, Arndt-Lappe, Sabine and Dillmann, Lisa, eds. Corpora and Lexis. Amsterdam: Brill. 69105.Google Scholar
Schweitzer, Katrin, Walsh, Michael, Calhoun, Sasha et al. 2015. Exploring the Relationship between Intonation and the Lexicon: Evidence for Lexicalised Storage of Intonation. Speech Communication 66. 6581.Google Scholar
Shattuck-Hufnagel, Stefanie. 1995. The Importance of Phonological Transcription in Empirical Approaches to ‘Stress Shift’ versus ‘Early Accent’: Comments on Grabe & Warren, and Vogel, Bunnell & Hoskins. In Connell, Bruce and Arvaniti, Amalia, eds. Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV. Cambridge: Cambridge University Press. 128–40.Google Scholar
Shattuck-Hufnagel, Stefanie. 2019. Toward an (Even) More Comprehensive Model of Speech Production Planning. Language, Cognition and Neuroscience 34(9). 1202–13.Google Scholar
Shattuck-Hufnagel, Stefanie, Ostendorf, Mari and Ross, K.. 1994. Stress Shift and Early Pitch Accent Placement in Lexical Items in American English. Journal of Phonetics 22(4). 357–88.CrossRefGoogle Scholar
Shih, Stephanie S. 2017. Phonological Influences in Syntactic Alternations. In Gribanova, Vera and Shih, Stephanie S., eds. The Morphosyntax-Phonology Connection. Oxford: Oxford University Press. 223–52.Google Scholar
Shih, Stephanie, Grafmiller, Jason, Futrell, Richard and Bresnan, Joan. 2015. Rhythm’s Role in Genitive Construction Choice in Spoken English. In Vogel, Ralf and van de Vijver, Ruben, eds. Rhythm in Cognition and Grammar: A Germanic Perspective. (TiLSM 286). Berlin: de Gruyter Mouton. 207–34.Google Scholar
Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Google Scholar
Steen, Francis F., Hougaard, Anders, Joo, Jungseock et al. 2018. Toward an Infrastructure for Data-Driven Multimodal Communication Research. Linguistics Vanguard 4(1). https://doi.org/10.1515/lingvan-2017-0041.Google Scholar
Sweet, Henry. 1876. Words, Logic, and Grammar. In Transactions of the Philological Society, 1875–1876. 470503.Google Scholar
Tilsen, Sam. 2012. Utterance Preparation and Stress Clash: Planning Prosodic Alternations. In Fuchs, Susanne, ed. Speech Planning and Dynamics (Speech production and perception 1). Frankfurt am Main/New York: Peter Lang. 115–50.Google Scholar
Tomlinson, John M., Liu, Qiang and Fox Tree, Jean E.. 2014. The Perceptual Nature of Stress Shifts. Language, Cognition and Neuroscience 29(9). 1046–58.Google Scholar
Uhrig, Peter. 2018. NewsScape and the Distributed Little Red Hen Lab: A digital Infrastructure for the Large-Scale Analysis of TV Broadcasts. In Zwierlein, Anne-Julia, Petzold, Jochen, Boehm, Katharina and Decker, Martin, eds. Anglistentag 2018 in Regensburg: Proceedings./Proceedings of the Conference of the German Association of University Teachers of English. Trier: Wissenschaftlicher Verlag Trier. 99114.Google Scholar
Wennerstrom, Ann. 1993. Focus on the Prefix: Evidence for Word-Internal Prosodic Words. Phonology 10. 30924.Google Scholar
Wells, John. 2008. Longman Pronunciation Dictionary. 3rd ed. Harlow: Pearson Longman.Google Scholar
Wiese, Richard. 2016. Prosodic Parallelism: Comparing Spoken and Written Language. Frontiers in Psychology 7. 1598.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×