Skip to main content Accessibility help
Hostname: page-component-758b78586c-qvhcc Total loading time: 0 Render date: 2023-11-28T21:22:34.860Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "useRatesEcommerce": true } hasContentIssue false


Published online by Cambridge University Press:  07 April 2022

Jesse Egbert
Northern Arizona University
Douglas Biber
Northern Arizona University
Bethany Gray
Iowa State University
Get access


Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Designing and Evaluating Language Corpora
A Practical Framework for Corpus Representativeness
, pp. 271 - 279
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Aarts, J., & Meijs, W. (eds.) 1984. Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research. Brill.CrossRefGoogle Scholar
Afros, E. 2014. Replying/responding to criticism in language studies. English for Specific Purposes 34: 7989.CrossRefGoogle Scholar
Aijmer, K. 2002. Modality in advanced Swedish learners’ written interlanguage. In Granger, S., Hung, J, & Petch-Tyson, S (eds.), Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. 5576. John Benjamins.Google Scholar
Andaleeb, S. S., & Conway, C. 2006. Customer satisfaction in the restaurant industry: An examination of the transaction-specific model. Journal of Services Marketing 20(1): 311.CrossRefGoogle Scholar
Anderson, W., & Corbett, J. 2009. Exploring English with Online Corpora: An Introduction. Palgrave.CrossRefGoogle Scholar
Anderson, M., & Magruder, J. 2012. Learning from the crowd: Regression discontinuity estimates of the effects of an online review database. Economic Journal 122(563): 957–89.CrossRefGoogle Scholar
Anthony, L. 2017. Corpus linguistics and vocabulary: A commentary on four studies. Vocabulary Learning and Instruction 62: 7987.Google Scholar
Atkins, S., Clear, J., & Ostler, N. 1992. Corpus design criteria. Literary and Linguistic Computing 7(1): 116.CrossRefGoogle Scholar
Aull, L., & Brown, D. W. 2013. Fighting words: A corpus analysis of gender representations in sports reportage. Corpora 8(1): 2752.CrossRefGoogle Scholar
Avram, A., Păiş, V., & Tufiş, D. 2020. Towards a Romanian end-to-end automatic speech recognition based on DEEPSPEECH2. Proceedings of the Romanian Academy (Series A) 21(4): 395402.Google Scholar
Baker, P., & Egbert, J. (eds.) 2016. Triangulating Methodological Approaches in Corpus Linguistic Research. Routledge.CrossRefGoogle Scholar
Banner, L. W. 2009. Biography as history. American Historical Review 114(3): 579–86.CrossRefGoogle Scholar
Beaugrande, R.-A. de, & Dressler, W. 1981. Introduction to Text Linguistics. Routledge.CrossRefGoogle Scholar
Becher, T. 1994. The significance of disciplinary differences. Studies in Higher Education 19(2): 151–61.CrossRefGoogle Scholar
Becker, K., & Gray, B. 2018. A situational and linguistic analysis of evaluative and primary research articles: Exploring academic journal registers. Paper presented at American Association for Applied Linguistics 2018 Conference. Chicago, IL.Google Scholar
Belmore, N. 1998. First use of the term “corpus linguistics.” Corpora listserv. ICAME, July 6, 1998. Listserv posting. Accessed January 5, 2021. Scholar
Bennett, G. 2010. Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Michigan ELT.CrossRefGoogle Scholar
Berber-Sardinha, T., Kauffmann, C., & Mayer Acunzo, C. 2014. A multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora 9(2): 239–71.Google Scholar
Beresewicz, M. 2017. A two-step procedure to measure representativeness of internet data sources. International Statistical Review 85(3): 473–93.CrossRefGoogle Scholar
Biber, D. 1988. Variation across Speech and Writing. Cambridge University Press.CrossRefGoogle Scholar
Biber, D. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8(4): 243–57.CrossRefGoogle Scholar
Biber, D., & Conrad, S. 2019. Register, Genre, and Style. 2nd ed. Cambridge University Press.CrossRefGoogle Scholar
Biber, D., Conrad, S., & Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.CrossRefGoogle Scholar
Biber, D., Conrad, S., Reppen, R., et al. 2004. Representing language use in the university: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. TOELF Monograph Series MS-25. Educational Testing Service.Google Scholar
Biber, D., & Egbert, J. 2018. Register Variation Online. Cambridge University Press.CrossRefGoogle Scholar
Biber, D., Egbert, J., & Davies, M. 2015. Exploring the composition of the searchable Web: A corpus-based taxonomy of web registers. Corpora, 10(1): 1145.CrossRefGoogle Scholar
Biber, D., Finegan, E., & Atkinson, D. 1994. ARCHER and its challenges: Compiling and exploring a representative corpus of historical English registers. In Fries, U., Tottie, G., & Schneider, P. (eds.), Creating and Using English Language Corpora. 114. Brill.Google Scholar
Biber, D., & Gray, B. 2016. Grammatical Complexity in Academic English: Linguistic Change in Writing. Cambridge University Press.CrossRefGoogle Scholar
Biber, D., Gray, B., & Poonpon, K. 2011. Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly 45(1): 535.CrossRefGoogle Scholar
Biber, D., Gray, B., Staples, S., & Egbert, J. 2022. The Register-Functional Approach to Grammatical Complexity: Theoretical Foundation, Descriptive Research Findings, Application. Routledge.Google Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Pearson Longman.Google Scholar
Biber, D., Reppen, R., Clark, V., & Walter, J. 2001. Representing spoken language in university settings: The design and construction of the spoken component of the T2 K-SWAL Corpus. In Simpson, R. & Swales, J. (eds.), Corpus Linguistics in North America. 4857. University of Michigan Press.Google Scholar
Björne, J., & Salakoski, T. 2015. TEES 2.2: Biomedical event extraction for diverse corpora. BMC Bioinformatics 16: 120.CrossRefGoogle ScholarPubMed
Blair, E., & Blair, J. 2014. Applied Survey Sampling. Sage.Google Scholar
Blair, I., Urland, G., & Ma, J. 2002. Using internet search engines to estimate word frequency. Behavior Research Methods, Instruments, and Computers 34: 286–90.CrossRefGoogle ScholarPubMed
Brezina, V. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press.CrossRefGoogle Scholar
Burgess, J., & Green, J. 2009. The entrepreneurial vlogger: Participatory culture beyond the professional/amateur divide. In Snickars, P. & Vonderau, P. (eds.), The YouTube Reader. 89107. National Library of Sweden.Google Scholar
Caine, B. 2010. Biography and History. Macmillan International Higher Education.CrossRefGoogle Scholar
Cheng, W. 2011. Exploring Corpus Linguistics: Language in Action. Routledge.CrossRefGoogle Scholar
Cheng, W., & Cheng, L. 2014. Epistemic modality in court judgments: A corpus-driven comparison of civil cases in Hong Kong and Scotland. English for Specific Purposes 33: 1526.CrossRefGoogle Scholar
Clear, J. 1992. Corpus sampling. In Keitner, G. (ed.), New Directions in English Language Corpora: Methodology, Results, Software Developments. 2131. Mouton de Gruyter.Google Scholar
Cortes, V. 2013. The purpose of this study is to: Connecting lexical bundles and moves in research article introductions. Journal of English for Academic Purposes 12: 3343.CrossRefGoogle Scholar
Cortes, V. 2015. Situating lexical bundles in the formulaic language spectrum: Origins and functional analysis developments. In Cortes, V & Csomay, E (eds.), Corpus-Based Research in Applied Linguistics. In Honor of Douglas Biber. 197216. John Benjamins.CrossRefGoogle Scholar
Cotos, E., Huffman, S., & Link, S. 2015. Furthering and applying move/step constructs: Technology-driven marshalling of Swalesian genre theory for EAP pedagogy. Journal of English for Academic Purposes 19: 5272.CrossRefGoogle Scholar
Crawford, W., & Csomay, E. 2015. Doing Corpus Linguistics. Routledge.CrossRefGoogle Scholar
Curran, J. R., & Osborne, M. 2002. A very large corpus doesn’t always yield reliable estimates. Proceedings of the 6th Conference on Language Learning 20: 16.Google Scholar
Davies, M. 2009. The 385+ million word Corpus of Contemporary American English 1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics 142: 159–90.Google Scholar
Davies, M. 2010. The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing 25(4): 447–64.CrossRefGoogle Scholar
Davies, M. 2012. Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English. Corpora 72: 121–57.Google Scholar
Davies, M. n.d. Size. Accessed January 9, 2021. Scholar
Davies, M. n.d. The COCA corpus. Accessed January 9, 2021. Scholar
Davies, M., & Fuchs, R. 2015. Expanding horizons in the study of world Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbe). English World-Wide 36(1): 128.CrossRefGoogle Scholar
Diederich, C. 2015. Sensory Adjectives in the Discourse of Food. John Benjamins.CrossRefGoogle Scholar
Dobrić, N. 2013. Theory and Practice of Corpus-Based Semantics. Narr.Google Scholar
Egbert, J. 2015. Publication type and discipline variation in published academic writing: Investigating statistical interaction in corpus data. International Journal of Corpus Linguistics 20(1): 129.CrossRefGoogle Scholar
Egbert, J. 2019. Corpus design and representativeness. In Berber Sardinha, T. & Veirano Pinto, M. (eds.), Multi-dimensional Analysis: Research Methods and Current Issues, 2742. Bloomsbury Academic.Google Scholar
Egbert, J., Biber, D., & Davies, M. 2015. Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology 66(9): 1817–31.CrossRefGoogle Scholar
Egbert, J., Burch, B., & Biber, D. 2020. Lexical dispersion and corpus design. International Journal of Corpus Linguistics 25(1): 89115.CrossRefGoogle Scholar
Egbert, J., Larsson, R., & Biber, D. 2020. Doing Linguistics with a Corpus. Cambridge University Press.CrossRefGoogle Scholar
Egbert, J., & Schnur, E. 2018. The role of the text in corpus and discourse analysis: Missing the trees for the forest. In. Taylor, C. & Marchi, A. (eds.), Corpus Approaches to Discourse: A Critical Review. 159–73. Routledge.Google Scholar
Francis, W., & Kučera, H. 1979. Brown Corpus manual. Accessed January 10, 2021, at Scholar
Francis, W., & Kučera, H. 1982. Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin.Google Scholar
Frobenius, M. 2011. Beginning a monologue: The opening sequence of video blogs. Journal of Pragmatics 43(3): 814–27.CrossRefGoogle Scholar
Gardner, D., & Davies, M. 2014. A new academic vocabulary list. Applied Linguistics 35(3): 305–27.CrossRefGoogle Scholar
Garner, J. 2016. A phrase-frame approach to investigating phraseology in learner writing across proficiency levels. International Journal of Learner Corpus Research 2(1): 3168.CrossRefGoogle Scholar
Garraty, J. A. 1957. The nature of biography. The Centennial Review of Arts & Science 12: 123–41.Google Scholar
Gatto, M. 2014. Web As Corpus: Theory and Practice. Continuum.Google Scholar
Gibson, M. 2016. YouTube and bereavement vlogging: Emotional exchange between strangers. Journal of Sociology 52(4): 631–45.CrossRefGoogle Scholar
Gilquin, G., De Cock, S., & Granger, S. 2010. The Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Presses universitaires de Louvain.Google Scholar
Gilquin, G., & Granger, S. 2011. From EFL to ESL: Evidence from the International Corpus of Learner English. In Mukherjee, J. (ed.), Exploring Second-Language Varieties of English and Learner Englishes: Bridging a Paradigm Gap. 5578. John Benjamins.CrossRefGoogle Scholar
Grafmiller, J. 2014. Variation in English genitives across modality and genres. English Language and Linguistics 18(3): 471–96.CrossRefGoogle Scholar
Gorsuch, R. L. 2015. Factor Analysis. Routledge.Google Scholar
Gray, B. 2015. Linguistic Variation in Research Articles: When Discipline Tells Only Part of the Story. John Benjamins.CrossRefGoogle Scholar
Gray, B., Cotos, E., & Smith, J. 2020. Combining rhetorical move analysis with multi-dimensional analysis: Research writing across disciplines. In Römer, U., Cortes, V., & Friginal, E. (eds.), Advances in Corpus-Based Research on Academic Writing. 138–68. John Benjamins.Google Scholar
Gries, Stefan Th. 2009. What is corpus linguistics? Language and Linguistics Compass 3. 117.CrossRefGoogle Scholar
Hamilton, N. 2009a. Biography. Harvard University Press.Google Scholar
Hamilton, N. 2009b. How to Do Biography: A Primer. Harvard University Press.CrossRefGoogle Scholar
Hanks, P. 2012. The corpus revolution in lexicography. International Journal of Lexicography 25(4): 398436.CrossRefGoogle Scholar
Hartig, A., & Lu, X. 2014. Plain English and legal writing: Comparing expert and novice writers. English for Specific Purposes 33: 8796.CrossRefGoogle Scholar
Hashimoto, B. 2020. “Describing the Language Experience of University Students.” Unpublished doctoral dissertation. Northern Arizona University.Google Scholar
Henry, G. 1990. Practical Sampling. Sage.CrossRefGoogle Scholar
Hogenraad, R., & Garagozov, R. 2014. Textual fingerprints of risk of war. Literary and Linguistic Computing 29(1): 4155.CrossRefGoogle Scholar
Hong, K., & Nenkova, A. 2014. Improving the estimation of word importance for news multi-document summarization. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 712–21. Sweden.CrossRefGoogle Scholar
Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge University Press.CrossRefGoogle Scholar
Hyland, K., & Jiang, F. K. 2018. Academic lexical bundles: How are they changing? International Journal of Corpus Linguistics 23(4): 383407.CrossRefGoogle Scholar
Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. 2013. The TenTen corpus family. 7th International Corpus Linguistics Conference. United Kingdom.Google Scholar
Johansson, S. 1992. The “Using Corpora” Conference, Oxford 1991. ICAME Journal 16: 113–15.Google Scholar
Johansson, S., Leech, G., & Goodluck, H. 1978. Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Accessed January 8, 2021, at Scholar
Johnston, T. 2014. The reluctant oracle: Using strategic annotations to add value to, and extract value from, a signed language corpus. Corpora 92: 155–89.Google Scholar
Kaeding, F. W. 1897. Die Hiifszeitwôrter in ihrem Verhâltnis zum Deutschen Wortschatz. B clin: Selbstverlag des Verfassers.Google Scholar
Keller, D., & Kostromitina, M. 2020. Characterizing non-chain restaurants’ Yelp star-ratings: Generalizable findings from a representative sample of Yelp reviews. International Journal of Hospitality Management, 86, 102440.CrossRefGoogle Scholar
Kennedy, G. 1998. An Introduction to Corpus Linguistics. Longman.Google Scholar
Kessler-Harris, A. 2009. Why biography? American Historical Review 114(3): 625–30.CrossRefGoogle Scholar
Kilgarriff, A., & Grefenstette, G. 2003. Introduction to the special issue on the Web as corpus. Computational Linguistics 29(3): 333–47.CrossRefGoogle Scholar
Kilgarriff, A., & Renau, I. 2013. esTenTen, a vast web corpus of peninsular and American Spanish. Procedia: Social and Behavioral Sciences 95: 1219.Google Scholar
Kilgariff, A., Rundell, M., & Dhonnchadha, E. U. 2006. Efficient corpus development for lexicography: Building the new Corpus for Ireland. Language Resources and Evaluation 40: 127–52.Google Scholar
Kish, L. 1965. Survey Sampling. Wiley.Google Scholar
Koplenig, A. 2019. Against statistical significance testing in corpus linguistics. Corpus Linguistics and Linguistic Theory 152: 321–46.Google Scholar
Kruskal, W., & Mosteller, F. 1979a. Representative sampling, I: Non-scientific literature. International Statistical Review 47(1): 1324.CrossRefGoogle Scholar
Kruskal, W., & Mosteller, F. 1979b. Representative sampling, II: Scientific literature, excluding statistics. International Statistical Review 472: 111–27.Google Scholar
Kruskal, W., & Mosteller, F. 1979c. Representative sampling, III: The current statistical literature. International Statistical Review 47(3): 245–65.Google Scholar
Kruskal, W., & Mosteller, F. 1980. Representative sampling, IV: The history of the concept in statistics, 1895–1939. International Statistical Review 482: 169–95.Google Scholar
Kübler, S., & Zinsmeister, H. 2015. Corpus Linguistics and Linguistically Annotated Corpora. Bloomsbury.Google Scholar
Kučera, H., & Francis, W. 1967. Computational Analysis of Present-Day American English. Brown University Press.Google Scholar
Kwan, B., Chan, H., & Lam, C. 2012. Evaluating prior scholarship in literature reviews of research articles: A comparative study of practices in two research paradigms. English for Specific Purposes 31: 188201.CrossRefGoogle Scholar
Kopaczyk, J. 2016. Review of Wendy Anderson (ed.), Language in Scotland. Corpus-Based Studies. Northern Scotland 7(1): 112–17.CrossRefGoogle Scholar
Labov, W. 2014. The role of African Americans in Philadelphia sound change. Language Variation and Change 26(1): 119.CrossRefGoogle Scholar
Lange, P. 2014. Commenting on YouTube rants: Perceptions of inappropriateness or civic engagement? Journal of Pragmatics, 73, 5365.CrossRefGoogle Scholar
Lauder, A. 2006. Data for lexicography: The central role of the corpus. Wacana 12(2): 220–42.Google Scholar
Leech, G. 1991. The state of the art in the corpus linguistics. In Aijmer, K. & Altenberg, B. (eds.), English Corpus Linguistics: Studies in Honor of Jan Svartvik. 829. Routledge.Google Scholar
Leech, G. 1992. 100 million words of English: The British National Corpus (BNC). Accessed January 14, 2021, from Scholar
Leech, G. 2007. New resources, or just better old ones? The Holdy Grail of representativeness. In Hundt, M., Nesselhauf, N., & Biewer, C. (eds.), Corpus Linguistics and the Web. 133–49. Brill.Google Scholar
Li, H., Meng, F., Jeong, M., & Zhang, Z. 2020. To follow others or be yourself? Social influence in online restaurant reviews. International Journal of Contemporary Hospitality Management 32(3): 1067–87.CrossRefGoogle ScholarPubMed
Lindquist, H. 2009. Corpus Linguistics and the Description of English. Edinburgh University Press.Google Scholar
Llanos, L. 2014. A Spanish learner oral corpus for computer-aided error analysis. Corpora 9(2): 207–38.Google Scholar
Luca, M. 2016. Reviews, reputation, and revenue: The case of Harvard Business School NOM Unit Working Paper No. 12–016.Google Scholar
Manning, C., & Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.Google Scholar
McEnery, T., & Hardie, A. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.Google Scholar
McEnery, T., & Hardie, A. 2013. The history of corpus linguistics. In Allan, K. (ed.), The Oxford Handbook of the History of Linguistics. 727–45. Oxford University Press.Google Scholar
McEnery, T., Xiao, R., & Tono, Y. 2006. Corpus-Based Language Studies: An Advanced Resource Book. Routledge.Google Scholar
McHugh, M. 2008. Standard error: Meaning and interpretation. Biochemia Medica 18(1): 713.CrossRefGoogle Scholar
Meunier, F., & Reppen, R. 2015. Corpus versus non-corpus-informed pedagogical materials: Grammar as the focus. In Biber, D. & Reppen, R. (eds.), Cambridge Handbook of English Corpus Linguistics. 498514. Cambridge University Press.CrossRefGoogle Scholar
Meyer, R. 2016. How many stories do newspapers publish per day? A look at how the New York Times, the Wall Street Journal, the Washington Post, and BuzzFeed compare. The Atlantic. May 26.Google Scholar
Miller, D. 2012. “The Challenge of Constructing a Reliable Word List: An Exploratory Corpus-Based Analysis of Introductory Psychology Textbooks.” Unpublished doctoral dissertation. Northern Arizona University.Google Scholar
Miller, D., & Biber, D. 2015. Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition. International Journal of Corpus Linguistics 20(1): 3053.CrossRefGoogle Scholar
Milroy, L. 1987. Observing and Analyzing Natural Language: A Critical Account of Sociolinguistic Method. Wiley-Blackwell.Google Scholar
Mindt, I. 2011. Adjective Complementation: An Empirical Analysis of Adjectives Followed by That-Clauses. John Benjamins.CrossRefGoogle Scholar
Murphy, R. 2020. Local consumer review survey 2020. December 9. Accessed January 9, 2021, from Scholar
Nelson, M. 2010. Building a written corpus: What are the basics? In O’Keeffe, A. & McCarthy, M. (eds.), The Routledge Handbook of Corpus Linguistics. 5365. Routledge.Google Scholar
Nesi, H., & Gardner, S. 2012. Genres across the Disciplines: Student Writing in Higher Education. Cambridge University Press.CrossRefGoogle Scholar
Newman, J., & Duncan, T. 2019. The subject of ROAR in the mind and in the corpus: What divergent results can teach us. Linguistica Atlantica 37(1): 227.Google Scholar
Paperno, D., Marelli, M., Tentori, K., & Baroni, M. 2014. Corpus-based estimates of word association predict biases in judgment of word co-occurrence likelihood. Cognitive Psychology 74: 6683.CrossRefGoogle ScholarPubMed
Potter, F. 2008. Precision. In Lavrakas, P. (ed.), Encyclopedia of Survey Research Methods (Volume 1). 598–9. Sage.Google Scholar
Rieger, B. 1979. Repräsentativität: von der unangemessenheit eines begriffs zur kennzeichnung eines problems linguistischer korpusbildung. In Bergenholtz, H. & Schaeder, B. (eds.), Empirische Textwissenschaft: Aufbau und Auswertung von Text-Corpora. 5270. Cornelsen.Google Scholar
Römer, U., & O’Donnell, M. 2011. From student hard drive to web corpus (part 1: The design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora 62: 159–77.Google Scholar
Ruan, Z. 2018. Structural compression in academic writing: An English-Chinese comparison study of complex noun phrases in research article abstracts. Journal of English for Academic Purposes 36: 3747.CrossRefGoogle Scholar
Sankoff, D. 1978. Linguistic Variation: Models and Methods. Academic Press.Google Scholar
Sankoff, D. 1988. Problems of representativeness. In von Ulrich Ammon, H., Dittmar, N., Mattheier, K., and Trudgill, P. (eds.), Sociolinguistics: An International Handbook of the Science of Language and Society (Volume 3). 899903. Walter de Gruyter.Google Scholar
Särndal, C.-E., Swensson, B., & Wretman, J. 2003. Model Assisted Survey Sampling. Springer.Google Scholar
Saxe, J. G. 1881. The Poems of John Godfrey Saxe. Houghton Mifflin and Company.Google Scholar
Schäfer, R. 2016. On bias-free crawling and representative web corpora. Proceedings of the 10th Web As Corpus Workshop and the EmpiriST Shared Task. 99105. Association for Computational Linguistics.CrossRefGoogle Scholar
Shuy, R. W., Wolfram, W., & Riley, W. K. 1968. Field Techniques in an Urban Language Study. Center for Applied Linguistics.Google Scholar
Siemund, P. 2014. The emergence of English reflexive verbs: An analysis based on the Oxford English Dictionary. English Language & Linguistics 18(1): 4973.CrossRefGoogle Scholar
Simpson, R., Lee, D., & Leicher, S. 2003. MICASE Manual: The Michigan Corpus of Academic Spoken English. English Language Institute of University of Michigan. Accessed October 25, 2021, at Scholar
Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford University Press.Google Scholar
Smith, J. 2019. “A Comparison of Prescriptive Usage Problems in Formal and Informal Written English.” Unpublished doctoral dissertation. Iowa State University.Google Scholar
Smith, J., & Adendorff, R. 2014. For the people: Defining communities of readership through an Appraisal comparison of letters to two South African newspapers. Discourse, Context & Media 3: 113.CrossRefGoogle Scholar
Staples, S., & Reppen, R. 2016. Understanding first-year L2 writing: A lexico-grammatical analysis across L1s, genres, and language ratings. Journal of Second Language Writing 32: 1735.CrossRefGoogle Scholar
Staples, S., Venetis, M., Robinson, J., & Dultz, R. 2020. Understanding the multi-dimensional nature of informational language in health care interactions. Register Studies 2(2): 241–74.CrossRefGoogle Scholar
Stefanowitsch, A. 2020. Corpus Linguistics: A Guide to the Methodology. Language Science Press.Google Scholar
Stephan, F., & McCarthy, P. 1958. Sampling Opinions: An Analysis of Survey Procedure. Wiley.Google Scholar
Stuart, A. 1984. The Ideas of Sampling. Oxford University Press.Google Scholar
Stubbs, M. 2006. Corpus analysis: The state of the art and three types of unanswered questions. In Thompson, G. & Hunston, S. (eds.), System and Corpus: Exploring Connections. 1536. Equinox.Google Scholar
Sudman, S. 1976. Applied Sampling. Academic Press.Google Scholar
Summers, D. 1993. Longman/Lancaster English Language Corpus: Criteria and design. International Journal of Lexicography 6(3): 181208.CrossRefGoogle Scholar
Sun, Y., & Jiang, J. 2014. Metaphor use in Chinese and US corporate mission statements: A cognitive sociolinguistic analysis. English for Specific Purposes 33: 414.CrossRefGoogle Scholar
Svartvik, J. 1990. The London-Lund Corpus of Spoken English: Description and Research. Lund University Press.Google Scholar
Svartvik, J. & Quirk, R. 1980. A Corpus of English Conversation. Lund Studies in English. Liber/Gleerups.Google Scholar
Tagliamonte, S. 2006. Analysing Sociolinguistic Variation. Cambridge University Press.CrossRefGoogle Scholar
Tasker, D. 2019. “Situational and Linguistic Variation in Undergraduate English-Department Student Writing.” Unpublished doctoral dissertation. Northern Arizona University.Google Scholar
Taylor, C. 2008. What is corpus linguistics? What the data says. ICAME Journal 32: 179200.Google Scholar
Teubert, W. 2005. My version of corpus linguistics. International Journal of Corpus Linguistics 10(3): 113.CrossRefGoogle Scholar
Thorndike, E. L. 1921. The Teacher’s Word Book of 10,000 Words. Teachers College Press.Google Scholar
Timmis, I. 2015. Corpus Linguistics for ELT: Research and Practice. Routledge.CrossRefGoogle Scholar
Titscher, S., Meyer, M., Wodak, R., & Vetter, E. 2000. Methods of Text and Discourse Analysis. Sage.Google Scholar
US Bureau of Labor Statistics. 2021. Occupational Outlook Handbook. Accessed January 9, 2021, at Scholar
Váradi, T. 2001/3. The linguistic relevance of corpus linguistics. Proceedings of the Corpus Linguistics 2001 Conference. 587–93. Available at Scholar
Watson, A. 2020. Number of daily newspapers in the U.S. 1970–2018. Accessed January 9, 2021, at Scholar
Weisser, M. 2016. Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis. Wiley.CrossRefGoogle Scholar
Werner, E. A. 2012. “Rants, Reactions, and Other Rhetorics: Genres of the YouTube Vlog.” Unpublished doctoral dissertation. University of North Carolina at Chapel Hill.Google Scholar
Wesch, M. 2009. YouTube and you: Experiences of self-awareness in the context collapse of the recording webcam. Explorations in Media Ecology 82: 1934.Google Scholar
West, M. 1953. A General Service List of English Words. Longman, Green & Company.Google Scholar
Whitaker’s Books in Print 1992: The Reference Catalogue of Current Literature. J. Whitaker & Sons Ltd.Google Scholar
White, M. 1994. “Language in Job Interviews: Differences Related to Success and Socioeconomic Variables.” Unpublished PhD dissertation. Northern Arizona University.Google Scholar
Wood, D. C., & Appel, R. 2014. Multiword constructions in first year business and engineering university textbooks and EAP textbooks. Journal of English for Academic Purposes 15: 113.CrossRefGoogle Scholar
Woods, A., Fletcher, P., & Hughes, A. 1986. Statistics in Language Studies. Cambridge University Press.CrossRefGoogle Scholar
Zhao, G., Sonsaat, S., Silpachi, A., Lucic, I., Chukharev-Hudilainen, E., Levis, J., & Guitierrez-Osuna, R. 2018. L2-ARCTIC: A non-native English speech corpus. English Publications 226. Retrieved January 9, 2021, from: Scholar
Zipf, G. K. 1949. Human behavior and the principle of least effort. Addison-Wesley Press.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • References
  • Jesse Egbert, Northern Arizona University, Douglas Biber, Northern Arizona University, Bethany Gray, Iowa State University
  • Book: Designing and Evaluating Language Corpora
  • Online publication: 07 April 2022
  • Chapter DOI:
Available formats

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • References
  • Jesse Egbert, Northern Arizona University, Douglas Biber, Northern Arizona University, Bethany Gray, Iowa State University
  • Book: Designing and Evaluating Language Corpora
  • Online publication: 07 April 2022
  • Chapter DOI:
Available formats

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • References
  • Jesse Egbert, Northern Arizona University, Douglas Biber, Northern Arizona University, Bethany Gray, Iowa State University
  • Book: Designing and Evaluating Language Corpora
  • Online publication: 07 April 2022
  • Chapter DOI:
Available formats