Comparing Annotation Types and <span class='italic'>n</span>-Gram Sizes

doi:10.1017/9781108589314.012

11 - Comparing Annotation Types and n-Gram Sizes

German Discourse Particles and Their English Reflexes in a Translation Corpus

from Part IV - Applications of Classification-Based Approaches

Published online by Cambridge University Press: 06 May 2022

Volker Gast

Edited by

Ole Schützler and

Julia Schlüter

Show author details

Ole Schützler: Affiliation:
Universität Leipzig
Julia Schlüter: Affiliation:
Universität Bamberg

Book contents

Get access

Summary

The chapter addresses a problem of contrastive pragmatics: How can we study correspondences between pragmatic markers in two languages if one language has a class of elements that the other language lacks? Specifically, the contribution deals with modal particles of German (ja and doch) and their reflexes in English translations. As there is no predetermined set of potential English correspondences, traditional distributional analyses are not feasible, and methods from Natural Language Processing are explored instead. Using 32 types of n-grams, differing in length and type of annotation, three classification tasks are carried out, in order to identify cues in the English translations that reflect the presence (or absence) of a particle in the German original. The results show that lemma-unigrams and -bigrams are often most informative (i.e. most accurate), while trigrams and 1-skip-2-grams provide important information about concomitants of modal particles that unigrams and bigrams miss. The results show that linguistic observables (n-grams) as the basis of quantitative analyses need to be carefully selected and explored in terms of their contribution to linguistic analysis.

Keywords

modal particles annotation types contrastive linguistics classification translation n-grams

Type: Chapter
Information: Data and Methods in Corpus Linguistics
Comparative Approaches
, pp. 323 - 352

DOI: https://doi.org/10.1017/9781108589314.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aijmer, Karin. 2009. Does English Have Modal Particles? In Renouf, Antoinette and Kehoe, Andrew, eds. Corpus Linguistics Reassessed. Papers from the 27th International Conference on English Language. Amsterdam and New York: Rodopi. 111–30.Google Scholar

Aijmer, Karin. 2020. Contrastive Pragmatics and Corpora. Contrastive Pragmatics 1(1). 28–57.Google Scholar

Borst, Dieter. 1985. Die affirmativen Modalpartikeln doch, ja, und schon: Ihre Bedeutung, Funktion, Stellung und ihr Vorkommen. Tübingen: Niemeyer.Google Scholar

Bublitz, Wolfram. 1979. Ausdrucksweisen der Sprechereinstellung im Deutschen und im Englischen: Untersuchungen zur Syntax, Semantik und Pragmatik der deutschen Modalpartikeln und Vergewisserungsfragen. Tübingen: Niemeyer.Google Scholar

Cartoni, Bruno, and Meyer, Thomas. 2012. Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies. In Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry et al., eds. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). Istanbul: European Language Resources Association (ELRA).Google Scholar

Cortez, Paulo. 2020. rminer: Data Mining Classification and Regression Methods. R package version 1.4.5.Google Scholar

Cortez, Paulo, and Embrechts, Mark J.. 2013. Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models. Information Sciences 225. 1–17.CrossRef Google Scholar

Cumming, Geoff. 2014. The New Statistics: Why and How. Psychological Science 25(1). 7–29.Google Scholar

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton and Toutanova, Kristina. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Burstein, Jill, Doran, Christy and Solorio, Thamar, eds. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Long and Short Papers. Minneapolis, MN: Association for Computational Linguistics. 4171–86.Google Scholar

Dietterich, Thomas G. 1998. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7). 1895–923.Google Scholar

Doherty, Monika. 1985. Epistemische Bedeutung. Berlin: Akademie Verlag.CrossRef Google Scholar

Fillmore, Charles. 1984. Remarks on Contrastive Pragmatics. In Fisiak, Jacek, ed. Contrastive Linguistics: Prospects and Problems. Berlin: Mouton de Gruyter. 119–41.Google Scholar

Fischer, Kerstin. 2007. Grounding and Common Ground: Modal Particles and Their Translation Equivalents. In Fetzer, Anita and Fischer, Kerstin, eds. Lexical Markers of Common Grounds. Amsterdam: Elsevier. 47–66.Google Scholar

Fischer, Kerstin, and Heide, Maiken. 2018. Inferential Processes in English and the Question Whether English Has Modal Particles. Open Linguistics 4(1). 509–36.Google Scholar

Gao, Qin, and Vogel, Stephan. 2008. Parallel Implementations of Word Alignment Tool. In Bretonnel Cohen, K. and Carpenter, Bob, eds. Software Engineering, Testing, and Quality Assurance for Natural Language Processing. Columbus, OH: Association for Computational Linguistics. 49–57.Google Scholar

Gast, Volker. 2008. Modal Particles and Context Updating: The Functions of German ja, doch, wohl and etwa. In Vater, Heinz, Letnes, Ole and Maagerø, Eva, eds. Modalverben und Grammatikalisierung. Trier: Wissenschaftlicher Verlag. 153–77.Google Scholar

Gast, Volker. 2015. On the Use of Translation Corpora in Contrastive Linguistics. A Case Study of Impersonalization in English and German. Languages in Contrast 15(1). 4–33.CrossRef Google Scholar

Gast, Volker, Bierkandt, Lennart and Rzymski, Christoph. 2015a. Annotating Modals with GraphAnno, a Configurable Lightweight Tool for Multi-Level Annotation. In Nissim, Malvina and Pietrandrea, Paola, eds. Proceedings of the Workshop on Models for Modality Annotation, held in conjunction with IWCS 11, 2015. Stroudsburg, PA. 19–28.Google Scholar

Gast, Volker, Bierkandt, Lennart and Rzymski, Christoph. 2015b. Creating and Retrieving Tense and Aspect Annotations with GraphAnno, a Lightweight Tool For Multi-level Annotation. In Bunt, H., ed. Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Annotation. Tilburg: Tilburg Center for Cognition and Communication. 23–8.Google Scholar

Grosz, Patrick Georg. 2020. Discourse Particles. In Gutzmann, Daniel, Matthewson, Lisa, Meier, Cécile, Rullmann, Hotze and Zimmermann, Thomas, eds. The Wiley Blackwell Companion to Semantics. Part F. Meaning, Use, and Cognition. Hoboken, NJ: John Wiley & Sons. 1–34.Google Scholar

Hentschel, Elke. 1986. Funktion und Geschichte deutscher Partikeln: ja, doch, halt und eben. Tübingen: Niemeyer.Google Scholar

Jurafsky, Dan, and Martin, James H.. 2019. Speech and Language Processing. 3rd draft. https://web.stanford.edu/~jurafsky/slp3/ (accessed 23 October 2020).Google Scholar

Karatzoglou, Alexandros, Smola, Alex, Hornik, Kurt and Zeileis, Achim. 2004. Kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software 11(9). 1–20.Google Scholar

Koehn, Philipp. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Tsujii, Junichi, ed. Proceedings of MT Summit X. Phuket, Thailand. 79–86.Google Scholar

König, Ekkehard, and Gast, Volker. 2018. Understanding English-German Contrasts. 4th ed. Berlin: Erich Schmidt-Verlag.Google Scholar

König, Ekkehard, Stark, Detlef and Requardt, Susanne. 1990. Adverbien und Partikeln: Ein deutsch-englisches Wörterbuch. Heidelberg: Julius Groos.Google Scholar

Marneffe, Marie-Catherine de, Dozat, Timothy, Silvaire, Natalia et al. 2014. Universal Stanford Dependencies: A Cross-Linguistic Typology. In Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry et al., eds. Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik. 4585–92.Google Scholar

Och, Franz Josef, and Ney, Hermann. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1). 19–51.CrossRef Google Scholar

Peng, Qi, Zhang, Yuhao, Zhang, Yuhui, Bolton, Jason and Manning, Christopher D.. 2020. Stanza: A Python Natural Language Processing toolkit for many human languages. arXiv preprint arXiv:2003.07082.Google Scholar

Santorini, Beatrice. 1991. Part-of-Speech Tagging Guidelines for the Penn Treebank Project. www.personal.psu.edu/xxl13/teaching/sp07/apling597e/resources/Tagset.pdf.Google Scholar

Schneider, Gerold, and Graën, Johannes. 2018. NLP Corpus Observatory: Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills. In Pilán, Ildikó, Volodina, Elena, Alfter, David and Borin, Lars, eds. Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning. Stockholm: LiU Electronic Press. 69–78.Google Scholar

Thurmair, Maria. 1989. Modalpartikeln und ihre Kombinationen, vol. 223, Linguistische Arbeiten. Berlin/New York: de Gruyter.CrossRef Google Scholar

Wang, Sida, and Manning, Christopher. 2012. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. In Haizhou, Li, Lin, Chin-Yew, Osborne, Miles, Lee, Gary Geunbae and Park, Jong C., eds. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Volume 2. Short Papers). Jeju Island, Korea: Association for Computational Linguistics. 90–4.Google Scholar

Weydt, Harald. 1969. Abtönungspartikel: Die deutschen Modalwörter und ihre französischen Entsprechungen. Bad Homburg: Gehlen.Google Scholar

Book contents

11 - Comparing Annotation Types and n-Gram Sizes

Summary

Keywords

Access options

References

Further Reading

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive