Hostname: page-component-77c89778f8-cnmwb Total loading time: 0 Render date: 2024-07-18T09:21:51.757Z Has data issue: false hasContentIssue false

Transliteration between spoken language corpora

Published online by Cambridge University Press:  08 June 2005

Jens Allwood
Affiliation:
Department of Linguistics, Göteborg University, Box 200, SE-405 30 Göteborg, Sweden. E-mail: jens@ling.gu.se, leifg@ling.gu.se, eliza@ling.gu.se, mgunnar@ling.gu.se
Peter Juel Henrichsen
Affiliation:
Center for Computational Modelling of Language, Copenhagen Business School, Bernhard Bangs Alle 17B, DK-2000 Frederiksberg, Denmark. E-mail: pjuel@id.cbs.dk
Leif Grönqvist
Affiliation:
Department of Linguistics, Göteborg University, Box 200, SE-405 30 Göteborg, Sweden. E-mail: jens@ling.gu.se, leifg@ling.gu.se, eliza@ling.gu.se, mgunnar@ling.gu.se
Elisabeth Ahlsén
Affiliation:
Department of Linguistics, Göteborg University, Box 200, SE-405 30 Göteborg, Sweden. E-mail: jens@ling.gu.se, leifg@ling.gu.se, eliza@ling.gu.se, mgunnar@ling.gu.se
Magnus Gunnarsson
Affiliation:
Department of Linguistics, Göteborg University, Box 200, SE-405 30 Göteborg, Sweden. E-mail: jens@ling.gu.se, leifg@ling.gu.se, eliza@ling.gu.se, mgunnar@ling.gu.se
Get access

Abstract

Comparison of languages and linguistic data is essential if progress in our understanding of the nature of spoken languages is to be made. We understand phenomena better through comparison and contrast. This paper discusses problems that arise in trying to transfer a spoken language corpus transcribed and formatted according to one standard into the standard and format of another corpus. The problems that arise are related both to the differences that exist between the standards of the corpora and to human errors leading to lack of reliability in creating the transcriptions. Although the discussion is based on transfer and transliteration between two specific corpora (the Danish BySoc, BySociolingvistisk Korpus, and the Swedish GSLC, Göteborg Spoken Language Corpus), we believe that the discussion in the article documents and highlights problems of a general kind which have to be faced whenever spoken language corpora of different formats are to be compared.

Type
Research Article
Copyright
© 2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)