Skip to main content Accessibility help

Constrained language use in Finnish: A corpus-driven approach

  • Ilmari Ivaska (a1) and Silvia Bernardini (a2)


It has been suggested that second languages and translated languages are constrained by an interplay of several linguistic systems. This paper reports on a data-driven quantitative study on constrained Finnish. We detect linguistic phenomena that distinguish constrained from non-constrained Finnish across constrained varieties, first/source languages, and registers. Implementing a two-phase method, we first detect key quantitative differences of syntactically defined POS bigrams between each variety-, language-pair- and register-specific constrained dataset and its non-constrained counterpart, using Boruta feature selection. We then use the results as variables in a Multi-dimensional Analysis. The results show that both nominal complexity and verbal/clausal complexity distinguish constrained from non-constrained Finnish. These differences interact with both type of constraint and register: the constrained varieties are less sensitive to register differences, and this tendency is more pronounced in learner Finnish than in translated Finnish. Leaving out any of these variables from the analysis would blur our view of this multi-faceted phenomenon.


Corresponding author

Emails for correspondence: and


Hide All
Baker, Mona. 1993. Corpus linguistics and translation studies: Implications and applications. In Baker, Mona, Francis, Gill & Tognini-Bonelli, Elena (eds.), Text and Technology: In Honour of John Sinclair, 233250. Amsterdam: John Benjamins.
Baker, Mona. 1996. Corpus-based translation studies: The challenges that lie ahead. In Somers, Harold (ed.), Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, 175187. Amsterdam: John Benjamins.
Baroni, Marco & Bernardini, Silvia. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3), 259274.
Becher, Viktor. 2010. Abandoning the notion of “translation-inherent” explicitation: Against a dogma of translation studies. Across Languages and Cultures 11(1), 128.
Berber Sardinha, Tony & Pinto, Marcia Veirano (eds.). 2014. Multi-dimensional Analysis, 25 Years On: A Tribute to Douglas Biber. Amsterdam: John Benjamins.
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.
Biber, Douglas. 1989. A typology of English texts. Linguistics 27(1), 343.
Biber, Douglas. 2014. Using Multi-dimensional Analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1), 734.
Biber, Douglas & Conrad, Susan. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.
Biber, Douglas, Gray, Bethany & Staples, Shelley. 2016. Predicting patterns of grammatical complexity across Language Exam Task types and proficiency levels. Applied Linguistics 37(5), 639668.
Bohnet, Bernd, Nivre, Joakim, Boguslavsky, Igor, Farkas, Richárd, Ginter, Filip & Hajič, Jan. 2013. Joint morphological and syntactic analysis for richly inflected languages. Transactions of the Association for Computational Linguistics 1, 415428.
Breiman, Leo. 2001. Random forests. Machine Learning 45(1), 532.
Bulté, Bram & Housen, Alex. 2012. Defining and operationalising L2 complexity. In Housen, Alex, Kuiken, Folkert & Vedder, Ineke (eds.), Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA, 2146. Amsterdam: John Benjamins.
Eskola, Sari. 2004. Untypical frequencies in translated language: A corpus-based study on a literary corpus of translated and non-translated Finnish. In Mauranen & Kujamäki (eds.), 83–99.
Filipović, Luna & Hawkins, John A. 2013. Multiple factors in second language acquisition: The CASP model. Linguistics 51(1), 145176.
Gabrielatos, Costas. 2018. Keyness analysis: Nature, metrics and techniques. In Taylor, Charlotte & Marchi, Anna (eds.), Corpus Approaches to Discourse: A Critical Review, 225258. Oxford: Routledge.
Granger, Sylviane. 2015. Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research 1(1), 724.
Gries, Stefan Th. On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement. Corpus Linguistics and Linguistic Theory, DOI: Published by de Gruyter, 16 April 2019.
Grosjean, François. 2001. The bilingual’s language modes. In Nicol, Janet (ed.), One Mind, Two Languages: Bilingual Language Processing, 122. Oxford: Blackwell.
House, Juliane. 2008. Beyond intervention: Universals in translation? trans-kom 1(1), 619.
Ivaska, Ilmari. 2014a. The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish. Apples: Journal of Applied Language Studies 8(3), 2138.
Ivaska, Ilmari. 2014b. Edistyneen oppijansuomen avainrakenteita. Korpusnäkökulma kahden kielimuodon tyypillisiin rakenteellisiin eroihin [Key structures in advanced learner Finnish: Corpus approaches towards structural differences between two language forms]. Virittäjä 118(2), 161193.
Ivaska, Ilmari. 2014c. Mahdollisuuden ilmaiseminen S1-suomea ja edistynyttä S2-suomea erottavana piirteenä [Expressions of possibility as a distringuishing feature between L1-Finnish and advanced L2-Finnish]. Lähivõrdlusi. Lähivertailuja 24, 4780.
Ivaska, Ilmari. 2015. Longitudinal changes in academic learner Finnish: A key structure analysis. International Journal of Learner Corpus Research 1(2), 210241.
Ivaska, Ilmari, Reunanen, Elisa & Siitonen, Kirsti. 2016. Infinite Konstruktionen im fortgeschrittenen Finnisch als Fremdsprache [Infinitive constructions in advanced Finnish as a foreign language]. Ural-Altaische Jahrbücher 26, 4676.
Ivaska, Ilmari & Siitonen, Kirsti. 2017a. Learner language morphology as a window to crosslinguistic influences: A key structure analysis. Nordic Journal of Linguistics 40(2), 225253.
Ivaska, Ilmari & Siitonen, Kirsti. 2017b. Tehdessä-konstruktio edistyneessä oppijansuomessa. Korpusanalyysin ja oikeakielisyysarviointien ristivalotus [The tehdessä construction in advanced learner Finnish]. Sananjalka 59, 154180.
Ivaska, Laura. 2019. Distinguishing translations from non-translations and identifying (in-)direct translations’ source languages. In Jantunen, Jarmo, Brunni, Sisko, Kunnas, Niina, Palviainen, Santeri & Västi, Katja (eds.), Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, Oulu, 125138.
Iwasaki, Shoichi. 2015. A multiple-grammar model of speakers’ linguistic knowledge. Cognitive Linguistics 26(2), 161210.
Jantunen, Jarmo. 2004. Untypical patterns in translations. In Mauranen & Kujamäki (eds.), 101–126.
Jantunen, Jarmo. 2008. Haasteita oppijankielen korpusanalyysille: oppijankielen universaalit [Challenges in the learner corpus analysis: The universals of learner language]. In Eslon, Pille (ed.), Õppijakeele analüüs: võimalused, probleemid, vajadused [Analysing learner language: Opportunities, problems, needs], 6792. Tallinn: Tallinna Ülikool.
Jantunen, Jarmo. 2011a. Kansainvälinen oppijansuomen korpus (ICLFI): typologia, taustamuuttujat ja annotointi [International Corpus of Learner Finnish (ICLFI): Typology, variables and annotation]. Lähivõrdlusi. Lähivertailuja 21, 86105.
Jantunen, Jarmo. 2011b. Avainsana-analyysi annotoidun oppijankieliaineiston tutkimuksessa: Alustavia havaintoja [Keyword analysis in the study of annotated learner language data: Preliminary observations]. In Lehtinen, Esa, Aaltonen, Sirkku, Koskela, Merja, Nevasaari, Elina & Skog-Södersved, Mariann (eds.), AFinla-e 3, 4861.
Jantunen, Jarmo & Eskola, Sari. 2002. Käännössuomi kielivarianttina: Syntaktisia ja leksikaalisia erityispiirteitä [Translated Finnish as a language variant: Untypical syntactical and lexical features]. Virittäjä 106(2), 184207.
Jarvis, Scott. 2000. Methodological rigor in the study of transfer: Identifying L1 influence in the interlanguage lexicon. Language Learning 50(2), 245309.
Jarvis, Scott. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA Yearbook 10, 169192.
Kaiser, Henry F. 1974. An index of factorial simplicity. Psychometrika 39(1), 3136.
Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli & Salakoski, Tapio. 2018. Turku Neural Parser Pipeline: An end-to-end system for the CoNLL 2018 Shared Task. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Brussels: ACL.
Kolehmainen, Leena, Meriläinen, Lea & Riionheimo, Helka. 2014. Interlingual reduction: Evidence from language contacts, translation and second language acquisition. In Paulasto, Heli, Meriläinen, Lea, Riionheimo, Helka & Kok, Maria (eds.), Language Contacts at the Crossroads of Disciplines, 332. Cambridge: Cambridge Scholars Publishing.
Koppel, Moshe & Ordan, Noam. 2011. Translationese and its dialects. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 13181326. Portland, OR: ACL.
Kruger, Haidee. 2017. The effects of editorial intervention: Implications for studies of the features of translated language. In De Sutter, Gert, Lefer, Marie-Aude & Delaere, Isabelle (eds.), Empirical Translation Studies: New Methodological and Theoretical Traditions, 113155. Berlin: de Gruyter.
Kruger, Haidee & van Rooy, Bertus. 2016. Constrained language: A multidimensional analysis of translated English and a non-native indigenised variety of English. English World-Wide 37(1), 2657.
Kruger, Haidee & van Rooy, Bertus. 2018. Register variation in written contact varieties of English. English World-Wide 39(2), 214242.
Kujamäki, Pekka. 2004. What happens to “unique items” in learners’ translations? In Mauranen & Kujamäki (eds.), 187–204.
Kursa, Miron & Rudnicki, Witold. 2010. Feature selection with the Boruta Package. Journal of Statistical Software, Articles 36(11), 113.
Lanstyák, Istvan & Heltai, Pál. 2012. Universals in language contact and translation. Across Languages and Cultures 13(1), 99121.
Leech, Geoffrey. 2006. New resources, or just better old ones? The Holy Grail of representativeness. In Nesselhauf, Nadja & Biewer, Carolin (eds.), Corpus Linguistics and the Web, 133149. London: Brill.
Lefer, Marie-Aude & Vogeleer, Svetlana. 2013. Interference and normalization in genre-controlled multilingual corpora: Introduction. Belgian Journal of Linguistics 27(1), 121.
Mauranen, Anna. 2000. Strange strings in translated language: A study on corpora. In Olohan, Maeve (ed.), Intercultural Faultlines: Research Models in Translation Studies, 119141. Manchester: St Jerome Publishing.
Mauranen, Anna. 2004. Corpora, universals and interference. In Mauranen & Kujamäki (eds.), 65–82.
Mauranen, Anna & Kujamäki, Pekka (eds.). 2004. Translation Universals: Do they Exist? Amsterdam: John Benjamins.
Mauranen, Anna & Tiittula, Liisa. 2005. MINÄ käännössuomessa ja supisuomessa [MINÄ ’I’ in the translated and non-translated Finnish]. In Mauranen, Anna & Jantunen, Jarmo (eds.), Käännössuomeksi. Tutkimuksia suomennosten kielestä [In translated Finnish: Studies on the language of Finnish translations], 3569. Tampere: Tampere University Press.
Miestamo, Matti. 2006. On the feasibility of complexity metrics. In Kerge, Krista & Sepper, Maria-Maren (eds.), FinEst Linguistics, Proceedings of the Annual Finnish and Estonian Conference of Linguistics, Tallinn, May 6–7, 2004, 11–26. Tallinn: Tallinna Ülikool.
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
Rabinovich, Ella, Nisioi, Sergu, Ordan, Noam & Wintner, Shuly. 2016. On the similarities between native, non-native and translated texts. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1870–1881. Berlin: ACL.
Revelle, William. 2018. psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, IL: Northwestern University.
Rohdenburg, Günther. 1996. Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics 7(2), 149182.
Seilonen, Marja. 2013. Epäsuora henkilöön viittaminen oppijansuomessa [Indirect references in Finnish learner language]. Ph.D. thesis, University of Jyväskylä.
Spoelman, Marianne. 2013. Prior linguistic knowledge matters: the use of the partitive case in Finnish learner language. Ph.D. thesis, University of Oulu.
Szmrecsanyi, Benedikt. 2017. Variationist sociolinguistics and corpus-based variationist linguistics: Overlap and cross-pollination potential. Canadian Journal of Linguistics/Revue canadienne de linguistique 62(4), 685701.
Szymor, Nina. 2018. Translation: Universals or cognition? A usage-based perspective. Target 30(1), 5386.
Teitto, Heli. 2010. Human referents in subtitles: A study on personal pronouns and proper nouns in translated and original Finnish. MA thesis, University of Eastern Finland.
Tirkkonen-Condit, Sonja. 2004. Unique items: Over- or under-represented in translated language? In Mauranen & Kujamäki (eds.), 177–184.
Tirkkonen-Condit, Sonja. 2005. Häviävätkö uniikkiainekset käännössuomesta? [Do unique items disappear from translated Finnish?]. In Mauranen, Anna & Jantunen, Jarmo (eds.), Käännössuomeksi. Tutkimuksia suomennosten kielestä [In translated Finnish: Studies on the language of Finnish translations], 12137. Tampere: Tampere University Press.
Toury, Gideon. 2012. Descriptive Translation Studies – and beyond: Revised edition. Amsterdam: John Benjamins.
VISK = Hakulinen, Auli, Vilkuna, Maria, Korhonen, Riitta, Koivisto, Vesa, Heinonen, Tarja Riitta & Alho, Irja, 2004: Iso suomen kielioppi [The great grammar of Finnish]. Helsinki: Suomalaisen Kirjallisuuden Seura. (accessed 24 November 2019).
Volansky, Vered, Ordan, Noam & Wintner, Shuly. 2013. On the features of translationese. Digital Scholarship in the Humanities 30(1), 98118.


Constrained language use in Finnish: A corpus-driven approach

  • Ilmari Ivaska (a1) and Silvia Bernardini (a2)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.