Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-04-30T20:17:57.641Z Has data issue: false hasContentIssue false

21 - Corpus-Based Methodologies in the Study of Heritage Languages

from Part II - Research Approaches to Heritage Languages

Published online by Cambridge University Press:  04 November 2021

Silvina Montrul
Affiliation:
University of Illinois, Urbana-Champaign
Maria Polinsky
Affiliation:
University of Maryland, College Park
Get access

Summary

Corpus linguistics has gained prominence in the study of language, both of standard and learner varieties, in the 1990, as technological advances allowed for quick and reliable analyses of large volumes of language data. Computer-aided analyses of large principled collections of authentic texts, known as language corpora, brought about new insights into the nature of language and afforded a more nuanced understanding of linguistic structure, language change, and language development. The chapter provides an overview of the key principles of corpus linguistics methods and of some frequently-used corpus instruments and procedures; it explores the potential benefits of application of corpus linguistics methods and instruments to the study of heritage languages, using the examples of few existing heritage language corpora. Overall, the chapter aims to engage the heritage language community in investing energy in the development of heritage language corpora and in making better use of the existing computational tools in the study of heritage languages.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsufieva, A., Kisselev, O., and Freels, S.. 2012. Results 2012: Using Flagship Data to Develop a Russian Learner Corpus of Academic Writing. Russian Language Journal 62, 79105.Google Scholar
Anthony, L. 2019. AntConc (Version 3.5.8) [Computer Software]. Tokyo, Japan: Waseda University. Available from www.laurenceanthony.net/softwareGoogle Scholar
Apresjan, V. Yu. 2017. Russkie possessivnye konstrukcii s nulevym i vyraženynnym glagolom: pravila i ošibki [Russian Possessive Constructions with Null Predicate: Rules and Errors]. Russkij jazyk v naučnom osvesčenii [Russian Language in Light of Scientific Description] 33, 86116.Google Scholar
Bannard, C. and Lieven, E.. 2012. Formulaic Language in L1 Acquisition. Annual Review of Applied Linguistics 32, 316.CrossRefGoogle Scholar
Barlow, M. 2003. Concordancing and Corpus Analysis Using MP 2.2. Houston: Athelstan.Google Scholar
Barlow, M. 2005. Computer-Based Analyses of Learner Language. In Ellis, R. and Barkhuizen, G. (eds.), Analysing Learner Language. Oxford: Oxford University Press, 335369.Google Scholar
Barlow, M. and Kemmer, S.. (eds.) 2000. Usage-Based Models of Language. Stanford, CA: CSLI Publications.Google Scholar
Beaudrie, S. M. 2012. A Corpus-Based Study on the Misspellings of Spanish Heritage Learners and Their Implications for Teaching. Linguistics and Education 23(1), 135144.CrossRefGoogle Scholar
Biber, D. 1991Variation across Speech and Writing. Cambridge: Cambridge University Press.Google Scholar
Biber, D. and Conrad, S.. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.Google Scholar
Biber, D. and Conrad, S.. 2010. Corpus Linguistics and Grammar Teaching. White Plains, NY: Pearson Education.Google Scholar
Biber, D. and Vásquez, C.. 2009. Writing and Speaking. In Handbook of Research on Writing: History, Society, School, Individual, Text. London and New York: Taylor and Francis, 657672.Google Scholar
Biber, D., Conrad, S. and Leech, G.. 2002. The Longman Student Grammar of Spoken and Written English. London: Longman.Google Scholar
Biber, D., Conrad, S. and Reppen, R.. 2004. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.Google Scholar
Carter, R. and McCarthy, M.. 2006. Cambridge Grammar of English. Cambridge: Cambridge University Press.Google Scholar
Centre for English Corpus Linguistics. 2019. Learner Corpora around the World. Retrieved May 2, 2019, from https://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpora-around-the-world.htmlGoogle Scholar
Conrad, S. 1999. The Importance of Corpus-Based Research for Language Teachers. System 27(1), 118.CrossRefGoogle Scholar
Conrad, S. and Biber, D.. 2009. Real Grammar: A Corpus-Based Approach to English. New York: Pearson/Longman.Google Scholar
Cortes, V. 2013. The Purpose of This Study Is To: Connecting Lexical Bundles and Moves in Research Article Introductions. Journal of English for Academic Purposes 12, 3343.CrossRefGoogle Scholar
Coxhead, A. 2000. A New Academic Word List. TESOL Quarterly 34(2), 213238.Google Scholar
Dagneaux, E., Denness, S., and Granger, S.. 1998. Computer-Aided Error Analysis. System 26, 163174.Google Scholar
Dubinina, I., Malamud, S. A., and Denisova-Schmidt, E.. 2013–present. Audio-aligned Longitudinal Corpus of Bilingual Russian Child and Child-directed Speech (BiRCh Longitudinal).Google Scholar
Ellis, N. C. 2002. Frequency Effects in Language Processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in Second Language Acquisition 24(2), 143188.Google Scholar
Ellis, N. C. 2012. Formulaic Language and Second Language Acquisition: Zipf and the Phrasal Teddy Bear. Annual Review of Applied Linguistics 32, 1744.Google Scholar
Ellis, N. C. 2017. Cognition, Corpora, and Computing: Triangulating Research in Usage‐Based Language LearningLanguage Learning 67(S1), 4065.Google Scholar
Ellis, N. C., Römer, U., and O’Donnell, M. B.. 2016. Usage-Based Approaches to Language Acquisition and Processing: Cognitive and Corpus Investigations of Construction Grammar (The Language Learning Monograph Series). Hoboken, NJ: Wiley-Blackwell.Google Scholar
Eskildsen, S. W. 2009. Constructing another Language: Usage-Based Linguistics in Second Language Acquisition. Applied Linguistics 30(3), 335357.Google Scholar
Eskildsen, S. W. and Cadierno, T.. 2007. Are Recurring Multi-Word Expressions Really Syntactic Freezes? Second Language Acquisition from the Perspective of Usage-Based Linguistics. Collocations and Idioms 1, 1920.Google Scholar
Flowerdew, L. 2012. Corpora and Language Education. Basingstoke: Palgrave Macmillan.Google Scholar
Goldberg, A. E. 2005. Constructions at Work: The Nature of Generalizations in Language. Oxford: Oxford University Press.CrossRefGoogle Scholar
Granger, S. 1996. From CA to CIA and Back: An Integrated Approach to Computerized Bilingual and Learner Corpora. In Aijmer, K., Altenberg, B., and Johansson, M. (eds.), Languages in Contrast. Papers from a Symposium on Text-Based Cross-Linguistic Studies. Lund 4–5 March 1994. Lund, Sweden: Lund University Press, 3751.Google Scholar
Granger, S. 1999. Use of Tenses by Advanced EFL Learners: Evidence from an Error-Tagged Computer Corpus. In Hasselgard, H. and Oksefjell, S. (eds.), Out of Corpora. Studies in Honour of Stig Johansson. Amsterdam: Rodopi, 191202.Google Scholar
Granger, S. 2004. Computer Learner Corpus Research: Current Status and Future Prospects. In Connor, U. and Upton, T. A. (eds.), Applied Corpus Linguistics: A Multidimensional Perspective. Amsterdam: Brill Rodopi, 123145.Google Scholar
Granger, S. 2009. The Contribution of Learner Corpora to Second Language Acquisition and Foreign Language Teaching. In Ajmer, K. (ed.), Corpora and Language Teaching. Philadelphia/Amsterdam: John Benjamins, 1332.Google Scholar
Gries, S. 2009. What Is Corpus Linguistics? Language and Linguistics Compass 3(5), 12251241.Google Scholar
Gries, S. 2011. Methodological and Interdisciplinary Stance in Corpus Linguistics. In Barnbrook, G., Viana, V., and Zyngier, S. (eds.), Perspectives on Corpus Linguistics: Connections and Controversies. Philadelphia/Amsterdam: John Benjamins, 8198.Google Scholar
Gries, S. and Wulff, S.. 2013. The Genitive Alternation in Chinese and German ESL Learners: Towards a Multifactorial Notion of Context in Learner Corpus Research. International Journal of Corpus Linguistics 18(3), 327356.Google Scholar
He, A. W. and Xiao, Y. (eds.) 2008. Chinese as a Heritage Language: Fostering Rooted World Citizenry (Vol. 2). Honolulu, HI: National Foreign Language Resource Center.Google Scholar
Herbst, T., Schmid, H-J., and Faulhaber, S.. 2014. From Collocations and Patterns to Constructions: An Introduction. In Herbst, T., Schmid, H-J., and Faulhaber, S. (eds.), Constructions Collocations Patterns. Berlin: Walter de Gruyter, 19.Google Scholar
Hilpert, M. 2013. Constructional Change in English: Developments in Allomorphy, Word Formation, and Syntax. Cambridge: Cambridge University Press.Google Scholar
Hinkel, E. 2001. Matters of Cohesion in L2 Academic Texts. Applied Language Learning 12 (2), 111132.Google Scholar
Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.Google Scholar
Iannozzi, M. 2015 Pro-drop in Faetar in Canada: A Study of a Heritage Language in Contact. Western Papers in Linguistics 1(2), 111.Google Scholar
Kisselev, O. and Alsufieva, A.. 2017. The Development of Syntactic Complexity in the Writing of Russian Language Learners: A Longitudinal Corpus Study. Russian Language Journal 67, 2753.Google Scholar
Kisselev, O. and Furniss, E.. forthcoming. Corpus Linguistics and Russian Language Pedagogy. In Dengub, E., Dubinina, I., and Merill, J. (eds.) The Art of Teaching Russian. Washington, DC: Georgetown University Press, 307332.Google Scholar
Kisselev, O., Dubinina, I., and Polinsky, M.. 2020. Form-Focused Instruction in the Heritage Language Classroom: Toward Research-Informed Heritage Language Pedagogy. In Frontiers in Education (Vol. 5). Switzerland: Frontiers, 53.Google Scholar
Kopotev, M. 2008. Ispol’zovanie èlektronnyx korpusov v prepodavanii russkogo jazyka [The Use of Electronic Corpora in the Teaching of the Russian Language]. In Lindstedt, J. et al. (eds.), SLAVICA HELSINGIENSIA 35, S ljubov’ju k slovu, Festschrift in Honour of Professor Arto Mustajoki on the Occasion of his 60th Birthday. Helsinki, 110118.Google Scholar
Kopotev, M. and Mustajoki, A.. 2008. Sovremennaja korpusnaja rusistika [Contemporary Corpus Linguistics]. In Mustajoki, A., Kopotev, M., Birjulin, L., and Protasova, Ju. (eds.), Instrumentarij rusistiki: Korpusnye podxody [Instruments for Russian Studies: Corpus Approaches]. Helsinki: Helsinki University Press, 724.Google Scholar
Kopotev, M., Kisselev, O., and Polinsky, M.. 2020. Lexical Strategies of Heritage Speakers: Collocations in Heritage Russian. In Halmari, H. and Backus, A. (eds.), Balancing Bilingualism: Linguistic Implications of Input Limitations. Special Issue for the International Journal of Bilingualism.Google Scholar
Kroch, A. and Taylor, A.. 2000. Verb–Object Order in Early Middle English. In Pintzuk, S., Tsoulas, G., and Warner, A. (eds.), Diachronic Syntax: Models and Mechanisms. Oxford: Oxford University Press, 132163.Google Scholar
Kroch, A., Santorini, B., and Delfs, L.. 2004. The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). Department of Linguistics, University of Pennsylvania. CD-ROM, First Edition, Release 3 (www.ling.upenn.edu/ppche-release-2016/PPCEME-RELEASE-3).Google Scholar
Lee, S. H., Jang, S. B., and Seo, S. K.. 2009. Annotation of Korean Learner Corpora for Particle Error Detection. CALICO Journal 26(3), 529544.Google Scholar
Leech, G. 1992. Corpora and Theories of Linguistic Performance, Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82. Berlin, New York: Mouton de Gruyter, 105122.Google Scholar
Leech, G. 2000. Grammars of Spoken English: New Outcomes of Corpus‐Oriented ResearchLanguage Learning 50(4), 675724.Google Scholar
Leech, G. 2005. Adding Linguistic Annotation. In Wynne, M. (ed.), Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow Books, 1729.Google Scholar
Li, P., Eskildsen, S. W., and Cadierno, T.. 2014. Tracing an L2 Learner's Motion Constructions Over Time: A Usage‐Based Classroom Investigation. The Modern Language Journal 98(2), 612628.Google Scholar
Lieven, E. and Tomasello, M.. 2008. Children's First Language Acquisition from a Usage-Based Perspective. In Robinson, P. and Ellis, N. C. (eds.), Handbook of Cognitive Linguistics and Second Language Acquisition. New York: Routledge, 168196.Google Scholar
Lozano, C. and Mendikoetxea, A.. 2013. Learner Corpora and Second Language Acquisition. Automatic Treatment and Analysis of Learner Corpus Data 59, 65100.Google Scholar
Lu, X., Yoon, J., and Kisselev, O.. 2018. A Phrase-Frame List for Social Science Research Article Introductions. Journal of English for Academic Purposes 36, 7685.Google Scholar
Lüdeling, A., Hirschmann, H., and Shadrova, A.. 2017. Linguistic Models, Acquisition Theories, and Learner Corpora: Morphological Productivity in SLA Research Exemplified by Complex Verbs in GermanLanguage Learning 67(S1), 96129.Google Scholar
MacWhinney, B. 2000. The CHILDES Project: Tools for Analyzing Talk: Transcription Format and Programs (3rd ed.) Mahwah, NJ: Lawrence Erlbaum Associates Publishers.Google Scholar
Malamud, S. A., Dubinina, I., Lưu, A., and Xue, N.. 2017–present. Parsed and Audio-aligned Corpus of Bilingual Russian Child and Child-directed Speech (Parsed BiRCh).Google Scholar
Mayer, M. 1969. Frog, Where Are You? New York: Dial Books for Young Readers.Google Scholar
McEnery, T. and Hardie, A.. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.Google Scholar
McEnery, T., Xiao, R., and Tono, Y.. 2006. Corpus-Based Language Studies: An Advanced Resource Book. New York: Routledge.Google Scholar
Ming, T. and Tao, H.. 2008. Developing a Chinese Heritage Language Corpus: Issues and a Preliminary Report. In He, A. W. and Xiao, Y. (eds.), Chinese as a Heritage Language: Fostering Rooted World Citizenry. Honolulu, HI: National Foreign Language Resource Center, 167188.Google Scholar
Mizumoto, A., Hamatani, S., and Imao, Y.. 2017. Applying the Bundle–Move Connection Approach to the Development of an Online Writing Support Tool for Research Articles. Language Learning 67(4), 885921.Google Scholar
Montrul, S. 2010. Current Issues in Heritage Language Acquisition. Annual Review of Applied Linguistics 30, 323.Google Scholar
Montrul, S. 2016. The Acquisition of Heritage Languages. Cambridge: Cambridge University Press.Google Scholar
Myles, F. 2005. Interlanguage Corpora and Second Language Acquisition Research. Second Language Research 21(4), 373391.Google Scholar
Nagy, N. and Lo, S.. 2019. Variation and Change in Heritage and Hong Kong Cantonese Classifiers Asia-Pacific Language Variation 1(5), 84108.Google Scholar
Nagy, N., Iannozzi, M., and Heap, D.. 2018. Faetar Null Subjects: A Variationist Study of a Heritage Language in Contact. International Journal of the Sociology of Language, 249.Google Scholar
Nagy, N., Aghdasi, N., Denis, D., and Motut, A.. 2011. Pro-drop in Heritage Languages: A Cross-Linguistic Study of Contact-Induced Change. Penn Working Papers in Linguistics 17, 2.Google Scholar
Nesselhauf, N. 2005. Collocations in a Learner Corpus. Phildelphia/Amsterdam: John Benjamins.Google Scholar
Nesi, H., Sharpling, G., and Ganobcsik-Williams, L.. 2004. Student Papers across the Curriculum: Designing and Developing a Corpus of British Student Writing. Computers and Composition 21, 439450.Google Scholar
Paquot, M. and Granger, S.. 2012. Formulaic Language in Learner Corpora. Annual Review of Applied Linguistics 32(1), 130149.CrossRefGoogle Scholar
Pawley, A. and Snyder, F. H.. 1983. Two Puzzles for Linguistic Theory: Nativelike Selection and Nativelike Fluency. In Richards, J. C. and Schimdt, R. W. (eds.), Language and Communication. New York: Longman, 191226.Google Scholar
Peirce, G. 2018. Representational and Processing Constraints on the Acquisition of Case and Gender by Heritage and L2 Learners of Russian: A Corpus Study. Heritage Language Journal 15(1), 95111.Google Scholar
Polinsky, M., Kopotev, M., and Kisselev, O.. 2020. Heritage Language Data Repository. National Heritage Language Resource Center, UCLA https://international.ucla.edu/heladaGoogle Scholar
Rebuschat, P. E., Detmar, M., and McEnery, T.. 2017. Language Learning Research at the Intersection of Experimental, Computational and Corpus-Based ApproachesLanguage Learning 67(S1), 613.Google Scholar
Richards, J. A., Gilkerson, J., Paul, T., and Xu, D.. 2008. The LENATM Automatic Vocalization Assessment. LTR-08-1. Retrieved from www.lenafoundation.org/wp-content/uploads/2014/10/LTR-08-1_Automatic_Vocalization_Assessment.pdfGoogle Scholar
Römer, U. 2011. Corpus Research Applications in Second Language Teaching. Annual Review of Applied Linguistics 31, 205225.Google Scholar
Rosen, A., Hana, J., Štindlová, B., and Feldman, A.. 2014. Evaluating and Automating the Annotation of a Learner Corpus. Language Resources and Evaluation 48(1), 6592.Google Scholar
Rozovskaya, A. and Roth, D.. 2014. Building a State-of-the-Art Grammatical Error Correction System. Transactions of American Computational Linguistics 2, 419434.Google Scholar
Rozovskaya, A. and Roth, D.. 2019. Grammar Error Correction in Morphologically Rich Languages: The Case of Russian. Transactions of the Association for Computational Linguistics 7, 117.Google Scholar
Schlitz, S. 2010. Introduction to the Special Issue: Exploring Corpus-Informed Approaches to Writing Research. Journal of Writing Research 2(2), 9198.Google Scholar
Schmitt, D. and Schmitt, N.. 2005. Focus on Vocabulary: Mastering the Academic Word List. White Plains, NY: Longman.Google Scholar
Scott, M. 2016. WordSmith Tools version 7 [Computer Program]. Stroud: Lexical Analysis Software.Google Scholar
Taylor, A. and Pintzuk, S.. 2012. Verb Order, Object Position and Information Status in Old English. York Papers in Linguistics Series 2, 29–52.Google Scholar
Taylor, C. 2008. What Is Corpus Linguistics? What the Data Says. ICAME Journal 32, 179200.Google Scholar
Taylor, L. and Barker, F.. 2008. Using Corpora in Language Assessment. In Shohamy, E. and Hornberger, N. H. (eds.), Encyclopedia of Language and Education (2nd ed.): Volume 7: Language Testing and Assessment. New York: Springer, 241254.Google Scholar
Tomasello, M. 2003. The Key is Social Cognition. In Gentner, D. and Goldin-Meadow, S. (eds.), Language in Mind: Advances in the Study of Language and Thought. Cambridge: MIT Press, 4757.Google Scholar
Torres Cacoullos, R. and Bauman, J.. 2018. Allative to Purposive Grammaticalization: A Quantitative Story of Spanish Para. Studies in Historical Ibero-Romance Morpho-Syntax 16, 165.Google Scholar
Torres Cacoullos, R. and Berry, G. 2018. Language Variation in US Spanish: Social Factors. Potowski, K. (ed.), Handbook of Spanish as a Minority/Heritage Language. London/New York: Routledge.Google Scholar
Torres Cacoullos, R., LaCasse, D., Johns, M., and De la Rosa Yacomelo, J.. 2017. El subjuntivo: hacia la rutinización. Moenia 23.Google Scholar
Tse, H. 2019. Beyond the Monolingual Core and out into the Wild: A Variationist Study of Early Bilingualism and Sound Change in Toronto Heritage Cantonese. Doctoral dissertation, University of Pittsburgh.Google Scholar
Vyatkina, N. 2012. The Development of Second Language Writing Complexity in Groups and Individuals: A Longitudinal Learner Corpus Study. Modern Language Journal 96(4), 572594.CrossRefGoogle Scholar
Vyatkina, N. 2013. Discovery Learning and Teaching with Electronic Corpora in an Advanced German Grammar Course. Die Unterrichtspraxis/Teaching German 46(1), 4461.Google Scholar
Vyrenkova, A. S., Polinskaja, M. S., and Raxilina, E. V.. 2014. Grammatika ošibok i grammatika konstrukcij: “Èritažnyj” (“unasledovannyj”) russkij jazyk. Voprosy Jazykoznanija 3, 319.Google Scholar
Wray, A. 2002. Formulaic Language and the Lexicon. Oxford: Oxford University Press.Google Scholar
Xiao-Desai, Y. and Wong, K. F.. 2017. Epistemic Stance in Chinese Heritage Language Writing: A Developmental View. Chinese as a Second Language Research 6(1), 73102.Google Scholar
Yagunova, E. V. and Pivovarova, L. M.. 2014. Ot Kollokacij k Konstrukcijam [From Collocations to Constructs]. Saj, S. S., Ovsyannikova, M. A., and Oskol’skaja, S. A. (eds.), Russkij Yazyk: Grammatika Konstrukcij i Leksiko-Semnaticeskie Podkhody [Russian Language: Grammar of Constructions and Lexico-Semmatic Approaches]. St. Petersburg: Nauka, 5685617.Google Scholar
Yu, L. C., Lee, L. H., and Chang, L. P.. 2014, November. Overview of Grammatical Error Diagnosis for Learning Chinese As a Foreign Language. Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications 42–47.Google Scholar
Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., Farra, N., Alkuhlani, S., and Oflazer, K.. 2014. Large Scale Arabic Error Annotation: Guidelines and Framework. Proceedings of LREC.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×