Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-17T16:25:34.109Z Has data issue: false hasContentIssue false

7 - Leveraging Large Corpora for Translation Using Sketch Engine

Published online by Cambridge University Press:  10 June 2019

Meng Ji
Affiliation:
University of Sydney
Michael Oakes
Affiliation:
University of Wolverhampton
Get access

Summary

Translation is a demanding, competitive, and labour-intensive activity that requires more than just advanced language skills. If translators are to thrive in such a cutting-edge working environment, they need to possess good computer literacy skills and be well-versed in the art of strategic information mining - in other words, they need to know how to make best use of the plethora of dictionaries, glossaries, books, websites, and other online resources at their disposal in order to find the required information in the shortest amount of time possible. This includes finding translation equivalents and domain-specific terms, checking collocations, idioms, and phrasal verbs, and exploring the way words are used in context by native speakers in order to render the text grammatically, semantically, and stylistically appropriate. Corpora, i.e. large electronic collections of computer-processed, linguistically annotated texts, can serve as an invaluable source of linguistic information, providing the translator with a sample of genuine texts in the target language and, in some cases, translations of similar texts. Over the past two decades, a wide variety of monolingual, comparable, and parallel corpora have been compiled, and these can be searched using either downloadable or online corpus query systems (CQS) such as the Sketch Engine. Whilst computer-assisted translation (CAT) tools such as SDL Trados and MemoQ are a staple of the translation process, translators are still relatively conservative when it comes to using corpora and CQSs to inform their practice. This chapter will introduce the Sketch Engine as a powerful suite of corpus tools for translation and cross-linguistic analysis. Centred around a case study featuring a real-life translation scenario, it will discuss various features of the software which can be used to leverage very large web-based monolingual and parallel corpora for translation, i.e. the concordancer, the word sketches tool, the thesaurus, the term extraction feature, and the corpus building tool.

Type
Chapter
Information
Advances in Empirical Translation Studies
Developing Translation Resources and Technologies
, pp. 110 - 144
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anthony, Lawrence (2018). AntConc (Version 3.5.6) [Computer Software]. Tokyo, Japan: Waseda University. Available at: www.laurenceanthony.net/software.Google Scholar
Atkins, B. T. Sue and Rundell, Michael (2008). The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press.CrossRefGoogle Scholar
Baisa, Vít, Jakubíček, Miloš, Kilgarriff, Adam, Kovář, Vojtěch and Rychlý, Pavel (2014). Bilingual word sketches: The translate button. In Proceedings of the XVI EURALEX International Congress. Bolzano: EURAC Research, pp. 505513.Google Scholar
Baisa, Vít, Cukr, Michal and Ulipová, Barbora (2015). Bilingual terminology extraction in Sketch Engine. In 9th Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2015). Brno: Tribun EU, pp. 6570.Google Scholar
Baroni, Marco, Kilgarriff, Adam, Pomikálek, Jan and Rychlý, Pavel (2006). WebBootCaT: instant domain-specific corpora to support human translators. In Proceedings of EAMT, the 11th Annual Conference of the European Association for Machine Translation. Oslo, Norway, pp. 247252.Google Scholar
Hanks, Patrick (2013). Lexical Analysis: Norms and Exploitations. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Kilgarriff, Adam, Kovář, Vojtěch and Rychlý, Pavel (2010). Tickbox lexicography. In eLexicography in the 21st Century: New Challenges, New Applications. Brussels: Presses universitaires de Louvain, pp. 411418.Google Scholar
Kilgarriff, Adam, Baisa, Vít, Bušta, Jan, Jakubíček, Miloš, Kovář, Vojtěch, Michelfeit, Jan, Rychlý, Pavel and Suchomel, Vít (2014a). The Sketch Engine: Ten years on. Lexicography 1, 736.CrossRefGoogle Scholar
Kilgarriff, Adam, Jakubíček, Miloš, Kovář, Vojtěch, Rychlý, Pavel and Suchomel, Vít (2014b). Finding terms in corpora for many languages with the Sketch Engine. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: The Association for Computational Linguistics, pp. 5356.Google Scholar
Kovář, Vojtěch, Baisa, Vít and Jakubíček, Miloš (2016). Sketch Engine for bilingual lexicography. International Journal of Lexicography 29(4), 339352.Google Scholar
McEnery, Tony and Wilson, Andrew (2001). Corpus Linguistics: An Introduction (2nd edition). Edinburgh: Edinburgh University Press.Google Scholar
Měchura, Michal (2017). Introducing Lexonomy: An open-source dictionary writing and publishing system. In Electronic Lexicography in the 21st Century: Proceedings of eLex 2017 Conference. Leiden: Lexical Computing, pp. 662679.Google Scholar
O’Keefe, Anne, Michael, McCarthy and Ronald, Carter (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Rychlý, Pavel (2008). A lexicographer-friendly association score. In Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN. Brno: Masaryk University, pp. 69.Google Scholar
Scott, Mike (2018). WordSmith Tools Version 7. Stroud: Lexical Analysis Software. Available at: www.lexically.net/wordsmith/downloads.Google Scholar
Sinclair, John (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.Google Scholar
Sinclair, John (2004). Trust the Text: Language, Corpus and Discourse. London/New York: Routledge.CrossRefGoogle Scholar
Thomas, James (2016). Discovering English with Sketch Engine: A Corpus-Based Approach to Language Exploration (2nd edition). Brno: Versatile.Google Scholar
Timmis, Ivor (2015). Corpus for ELT: Research and Practice. London: Routledge.CrossRefGoogle Scholar
Zaretskaya, Anna, Pastor, Gloria Corpas and Seghiri, Miriam (2015). Translators’ requirements for translation technologies: A user survey. In Corpas-Pastor, Gloria, Seghiri-Domınguez, Miriam, Gutierrez-Florido, Rut and Urbano-Medaña, Miriam (eds.), New Horizons in Translation and Interpreting Studies (Full papers). Malaga, Spain: Tradulex, pp. 247254.Google Scholar
Zaretskaya, Anna, Pastor, Gloria Corpas and Seghiri, Miriam (2016). Corpora in computer-assisted translation: A users’ view. In Corpas-Pastor, Gloria and Seghiri, Miriam (eds.), Corpus-Based Approaches to Translation and Interpreting: From Theory to Applications. Frankfurt: Peter Lang, pp. 253276.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×