Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-wzw2p Total loading time: 0 Render date: 2024-06-01T18:25:11.894Z Has data issue: false hasContentIssue false

11 - Comparing Annotation Types and n-Gram Sizes

German Discourse Particles and Their English Reflexes in a Translation Corpus

from Part IV - Applications of Classification-Based Approaches

Published online by Cambridge University Press:  06 May 2022

Ole Schützler
Affiliation:
Universität Leipzig
Julia Schlüter
Affiliation:
Universität Bamberg
Get access

Summary

The chapter addresses a problem of contrastive pragmatics: How can we study correspondences between pragmatic markers in two languages if one language has a class of elements that the other language lacks? Specifically, the contribution deals with modal particles of German (ja and doch) and their reflexes in English translations. As there is no predetermined set of potential English correspondences, traditional distributional analyses are not feasible, and methods from Natural Language Processing are explored instead. Using 32 types of n-grams, differing in length and type of annotation, three classification tasks are carried out, in order to identify cues in the English translations that reflect the presence (or absence) of a particle in the German original. The results show that lemma-unigrams and -bigrams are often most informative (i.e. most accurate), while trigrams and 1-skip-2-grams provide important information about concomitants of modal particles that unigrams and bigrams miss. The results show that linguistic observables (n-grams) as the basis of quantitative analyses need to be carefully selected and explored in terms of their contribution to linguistic analysis.

Type
Chapter
Information
Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 323 - 352
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Further Reading

Aijmer, Karin. 2020. Contrastive Pragmatics and Corpora. Contrastive Pragmatics 1(1). 2857.CrossRefGoogle Scholar
Gast, Volker. 2015. On the Use of Translation Corpora in Contrastive Linguistics: A Case Study of Impersonalization in English and German. Languages in Contrast 15(1). 433.Google Scholar
Grosz, Patrick Georg. 2020. Discourse Particles. In Gutzmann, Daniel, Matthewson, Lisa, Meier, Cécile, Rullmann, Hotze and Zimmermann, Thomas, eds. The Wiley Blackwell Companion to Semantics. Part F. Meaning, Use, and Cognition. Hoboken, NJ: John Wiley & Sons. 134.Google Scholar
Jurafsky, Dan, and Martin, James H.. 2019. Speech and Language Processing. 3rd draft. https://web.stanford.edu/~jurafsky/slp3/ (accessed 23 October 2020).Google Scholar

References

Aijmer, Karin. 2009. Does English Have Modal Particles? In Renouf, Antoinette and Kehoe, Andrew, eds. Corpus Linguistics Reassessed. Papers from the 27th International Conference on English Language. Amsterdam and New York: Rodopi. 111–30.Google Scholar
Aijmer, Karin. 2020. Contrastive Pragmatics and Corpora. Contrastive Pragmatics 1(1). 2857.Google Scholar
Borst, Dieter. 1985. Die affirmativen Modalpartikeln doch, ja, und schon: Ihre Bedeutung, Funktion, Stellung und ihr Vorkommen. Tübingen: Niemeyer.Google Scholar
Bublitz, Wolfram. 1979. Ausdrucksweisen der Sprechereinstellung im Deutschen und im Englischen: Untersuchungen zur Syntax, Semantik und Pragmatik der deutschen Modalpartikeln und Vergewisserungsfragen. Tübingen: Niemeyer.Google Scholar
Cartoni, Bruno, and Meyer, Thomas. 2012. Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies. In Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry et al., eds. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). Istanbul: European Language Resources Association (ELRA).Google Scholar
Cortez, Paulo. 2020. rminer: Data Mining Classification and Regression Methods. R package version 1.4.5.Google Scholar
Cortez, Paulo, and Embrechts, Mark J.. 2013. Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models. Information Sciences 225. 117.CrossRefGoogle Scholar
Cumming, Geoff. 2014. The New Statistics: Why and How. Psychological Science 25(1). 729.Google Scholar
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton and Toutanova, Kristina. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Burstein, Jill, Doran, Christy and Solorio, Thamar, eds. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Long and Short Papers. Minneapolis, MN: Association for Computational Linguistics. 4171–86.Google Scholar
Dietterich, Thomas G. 1998. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7). 1895–923.Google Scholar
Doherty, Monika. 1985. Epistemische Bedeutung. Berlin: Akademie Verlag.CrossRefGoogle Scholar
Fillmore, Charles. 1984. Remarks on Contrastive Pragmatics. In Fisiak, Jacek, ed. Contrastive Linguistics: Prospects and Problems. Berlin: Mouton de Gruyter. 119–41.Google Scholar
Fischer, Kerstin. 2007. Grounding and Common Ground: Modal Particles and Their Translation Equivalents. In Fetzer, Anita and Fischer, Kerstin, eds. Lexical Markers of Common Grounds. Amsterdam: Elsevier. 4766.Google Scholar
Fischer, Kerstin, and Heide, Maiken. 2018. Inferential Processes in English and the Question Whether English Has Modal Particles. Open Linguistics 4(1). 509–36.Google Scholar
Gao, Qin, and Vogel, Stephan. 2008. Parallel Implementations of Word Alignment Tool. In Bretonnel Cohen, K. and Carpenter, Bob, eds. Software Engineering, Testing, and Quality Assurance for Natural Language Processing. Columbus, OH: Association for Computational Linguistics. 4957.Google Scholar
Gast, Volker. 2008. Modal Particles and Context Updating: The Functions of German ja, doch, wohl and etwa. In Vater, Heinz, Letnes, Ole and Maagerø, Eva, eds. Modalverben und Grammatikalisierung. Trier: Wissenschaftlicher Verlag. 153–77.Google Scholar
Gast, Volker. 2015. On the Use of Translation Corpora in Contrastive Linguistics. A Case Study of Impersonalization in English and German. Languages in Contrast 15(1). 433.CrossRefGoogle Scholar
Gast, Volker, Bierkandt, Lennart and Rzymski, Christoph. 2015a. Annotating Modals with GraphAnno, a Configurable Lightweight Tool for Multi-Level Annotation. In Nissim, Malvina and Pietrandrea, Paola, eds. Proceedings of the Workshop on Models for Modality Annotation, held in conjunction with IWCS 11, 2015. Stroudsburg, PA. 1928.Google Scholar
Gast, Volker, Bierkandt, Lennart and Rzymski, Christoph. 2015b. Creating and Retrieving Tense and Aspect Annotations with GraphAnno, a Lightweight Tool For Multi-level Annotation. In Bunt, H., ed. Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Annotation. Tilburg: Tilburg Center for Cognition and Communication. 23–8.Google Scholar
Grosz, Patrick Georg. 2020. Discourse Particles. In Gutzmann, Daniel, Matthewson, Lisa, Meier, Cécile, Rullmann, Hotze and Zimmermann, Thomas, eds. The Wiley Blackwell Companion to Semantics. Part F. Meaning, Use, and Cognition. Hoboken, NJ: John Wiley & Sons. 134.Google Scholar
Hentschel, Elke. 1986. Funktion und Geschichte deutscher Partikeln: ja, doch, halt und eben. Tübingen: Niemeyer.Google Scholar
Jurafsky, Dan, and Martin, James H.. 2019. Speech and Language Processing. 3rd draft. https://web.stanford.edu/~jurafsky/slp3/ (accessed 23 October 2020).Google Scholar
Karatzoglou, Alexandros, Smola, Alex, Hornik, Kurt and Zeileis, Achim. 2004. Kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software 11(9). 120.Google Scholar
Koehn, Philipp. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Tsujii, Junichi, ed. Proceedings of MT Summit X. Phuket, Thailand. 7986.Google Scholar
König, Ekkehard, and Gast, Volker. 2018. Understanding English-German Contrasts. 4th ed. Berlin: Erich Schmidt-Verlag.Google Scholar
König, Ekkehard, Stark, Detlef and Requardt, Susanne. 1990. Adverbien und Partikeln: Ein deutsch-englisches Wörterbuch. Heidelberg: Julius Groos.Google Scholar
Marneffe, Marie-Catherine de, Dozat, Timothy, Silvaire, Natalia et al. 2014. Universal Stanford Dependencies: A Cross-Linguistic Typology. In Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry et al., eds. Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik. 4585–92.Google Scholar
Och, Franz Josef, and Ney, Hermann. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1). 1951.CrossRefGoogle Scholar
Peng, Qi, Zhang, Yuhao, Zhang, Yuhui, Bolton, Jason and Manning, Christopher D.. 2020. Stanza: A Python Natural Language Processing toolkit for many human languages. arXiv preprint arXiv:2003.07082.Google Scholar
Santorini, Beatrice. 1991. Part-of-Speech Tagging Guidelines for the Penn Treebank Project. www.personal.psu.edu/xxl13/teaching/sp07/apling597e/resources/Tagset.pdf.Google Scholar
Schneider, Gerold, and Graën, Johannes. 2018. NLP Corpus Observatory: Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills. In Pilán, Ildikó, Volodina, Elena, Alfter, David and Borin, Lars, eds. Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning. Stockholm: LiU Electronic Press. 6978.Google Scholar
Thurmair, Maria. 1989. Modalpartikeln und ihre Kombinationen, vol. 223, Linguistische Arbeiten. Berlin/New York: de Gruyter.CrossRefGoogle Scholar
Wang, Sida, and Manning, Christopher. 2012. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. In Haizhou, Li, Lin, Chin-Yew, Osborne, Miles, Lee, Gary Geunbae and Park, Jong C., eds. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Volume 2. Short Papers). Jeju Island, Korea: Association for Computational Linguistics. 90–4.Google Scholar
Weydt, Harald. 1969. Abtönungspartikel: Die deutschen Modalwörter und ihre französischen Entsprechungen. Bad Homburg: Gehlen.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×