Skip to main content Accessibility help

Professional language in Swedish clinical text: Linguistic characterization and comparative studies

  • Kelly Smith (a1), Beata Megyesi (a2), Sumithra Velupillai (a3) and Maria Kvist (a4)


This study investigates the linguistic characteristics of Swedish clinical text in radiology reports and doctor's daily notes from electronic health records (EHRs) in comparison to general Swedish and biomedical journal text. We quantify linguistic features through a comparative register analysis to determine how the free text of EHRs differ from general and biomedical Swedish text in terms of lexical complexity, word and sentence composition, and common sentence structures. The linguistic features are extracted using state-of-the-art computational tools: a tokenizer, a part-of-speech tagger, and scripts for statistical analysis. Results show that technical terms and abbreviations are more frequent in clinical text, and lexical variance is low. Moreover, clinical text frequently omit subjects, verbs, and function words resulting in shorter sentences. Clinical text not only differs from general Swedish, but also internally, across its sub-domains, e.g. sentences lacking verbs are significantly more frequent in radiology reports. These results provide a foundation for future development of automatic methods for EHR simplification or clarification.



Hide All
Aantaa, Kirsi. 2012. Mot patientvänligare epikriser. En kontrastiv undersökning [Towards more patient friendly discharge letters: A contrastive study]. MA thesis, Department of Nordic Languages, University of Turku.
Adnan, Mehnaz, Warren, Jim & Orr, Martin. 2010. Assessing text characteristics of electronic discharge summaries and their implications for patient readability. Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management 108, 77–84.
Allvin, Helen. 2010. Patientjournalen som genre. En text- och genreanalys om patientjournalers relation till patientdatalagen [The patient record as genre: A text and genre analysis of the relationship of patient records and the Patient Data Act]. MA thesis, Department of Nordic Languages, Stockholm University.
Allvin, Helen, Carlsson, Elin, Dalianis, Hercules, Danielsson-Ojala, Riitta, Daudaravicius, Vidas, Hassel, Martin, Kokkinakis, Dimitrios, Lundgren-Laine, Heljö, Nilsson, Gunnar H, Nytrø, Øystein, Salanterä, Sanna, Skeppstedt, Maria, Suominen, Hanna & Velupillai, Sumithra. 2011. Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2 (Suppl. 3):S1.
Aramaki, Eiji, Miura, Yasuhide, Tonoike, Masatsugu, Ohkuma, Tomoko, Mashuichi, Hiroshi & Ohe, Kazuhiko. 2009. TEXT2TABLE: Medical Text summarization system based on named entity recognition and modality identification. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP ’09), 185–192.
Biber, Douglas & Conrad, Susan. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.
Borin, Lars, Grabar, Natalia, Gronostaj, Maria Toporowska, Hallett, Catalina, Hardcastle, David, Kokkinakis, Dimitrios, Williams, Sandra & Willis, Alistair. 2009. Semantic Mining Deliverable D27.2: Empowering the Patient with Language Technology (Technical Report Semantic Mining, NOE 507505), 175. Göteborg: Göteborg University.
Bretschneider, Claudia, Zillner, Sonja & Hammon, Matthias. 2013. Identifying pathological findings in German radiology reports using a syntacto-semantic parsing approach. Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP ’13), 27–35.
Campbell, David A. & Johnson, Stephen B.. 2001. Comparing syntactic complexity in medical and non-medical corpora. AMIA Annual Symposium Proceedings 2001, 90–94.
Coden, Anni R., Pakhomov, Serguei V., Ando, Rie K., Duffy, Patrick H. & Chute, Christopher G.. 2005. Domain-specific language models and lexicons for tagging. Journal of Biomedical Informatics 38 (6), 422430.
Dalianis, Hercules, Hassel, Martin, Henriksson, Aron & Skeppstedt, Maria. 2012. Stockholm EPR Corpus: A clinical database used to improve health care. Proceedings of Fourth Swedish Language Technology Conference, 17–18.
Dalianis, Hercules, Hassel, Martin & Velupillai, Sumithra. 2009. The Stockholm EPR Corpus – characteristics and some initial findings. Proceedings of the 14th International Symposium on Health Information Management Research – ISHIMR 2009, 243–249.
Fan, Jung Wei, Yang, Elly W., Jiang, Min, Prasad, Rashmi, Loomis, Richard M., Zisook, Daniel S., Denny, Josh C., Xu, Hua & Huang, Yang. 2013. Syntactic parsing of clinical text: Guideline and corpus development with handling ill-formed sentences. Journal of the American Medical Informatics Association 20, 110.
Ferraro, Jeffrey P., Daumé III, Hal, DuVall, Scott L., Chapman, Wendy Webber, Harkema, Henk & Haug, Peter J.. 2013. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. Journal of the American Medical Informatics Association 20 (5), 931939.
Friedman, Carol, Kra, Pauline & Rzhetsky, Andrey. 2002. Two biomedical sublanguages: A description based on the theories of Zellig Harris. Journal of Biomedical Informatics 35 (4), 222235.
Grigonyté, Gintaré, Kvist, Maria, Velupillai, Sumithra & Wirén, Mats. 2014. Improving readability of Swedish electronic health records through lexical simplification: First results. Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)@EACL, 74–83.
Hahn, Udo & Wermter, Joachim. 2004. High-performance tagging on medical texts. Proceedings of the 20th International Conference on Computational Linguistics (COLING ’04), 973–979.
Isenius, Niklas, Velupillai, Sumithra & Kvist, Maria. 2012. Initial results in the development of SCAN: A Swedish clinical abbreviation normalizer. Proceedings of the CLEF 2012 Workshop on Cross-language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis.
Keselman, Alla, Slaughter, Laura, Smith, Catherine Arnott, Kim, Hyeoneui, Divita, Guy, Browne, Allen, Tsai, Christopher & Zeng-Treitler, Qing. 2007. Towards consumer-friendly PHRs: Patients’ experience with reviewing their health records. Proceedings of the Eighth International Conference on Language Resources and Evaluation, 399–403.
Kokkinakis, Dimitrios. 2012. The journal of the Swedish Medical Association – a corpus resource for biomedical text mining in Swedish. Proceedings of the 3rd Workshop on Building and Evaluating for Biomedical Text Mining (BioTxtM), LREC 2012 Workshop, 40–44.
Krauthammer, Michael & Nenadic, Goran. 2004. Term identification in the biomedical literature. Journal of Biomedical Informatics 37 (6), 512526.
Kvist, Maria, Skeppstedt, Maria, Velupillai, Sumithra & Dalianis, Hercules. 2011. Modeling human comprehension of Swedish medical records for intelligent access and summarization systems – future vision, a physician's perspective. Proceedings of Scandinavian Health Informatics Meeting, 31–35.
Kvist, Maria & Velupillai, Sumithra. 2013. Professional language in Swedish radiology reports – characterization for patient-adapted text simplification. Proceedings of Scandinavian Conference on Health Informatics, 55–60.
Liu, Hongfang, Lussier, Yves A. & Friedman, Carol. 2001. A study of abbreviations in the UMLS. AMIA Annual Symposium Proceedings 2001, 393–397.
Melin, Lars. 2004. Fattaru?! [Do ya get it?!]. Forskning och Framsteg 3.
Meystre, Stephane M., Savova, Guergana K., Kipper-Schuler, Karin C. & Hurdle, John F. 2008. Extracting information from textual documents in the electronic health record: A review of recent research. IMIA Yearbook of Medical Informatics 47 (S1), 128144.
Mühlenbock, Katarina & Kokkinakis, Sofie Johansson. 2009. LIX 68 revisited – an extended readability measure. Proceedings of Corpus Linguistics 2009,
Olsson, May. 2011. Vem begriper patientjournalen? [Who comprehends the patient record?]. BA thesis, Department of Language and Literature, Linné University.
Östling, Robert. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology 3, 118.
Ownby, Raymond. 2005. Influence of vocabulary and sentence complexity and passive voice on the readability of consumer-oriented mental health information on the Internet. AMIA Annual Symposium Proceedings 2005, 585–588.
Pakhomov, Serguei, Pedersen, Ted & Chute, Christopher G.. 2005. Abbreviation and acronym disambiguation in clinical discourse. AMIA Annual Symposium Proceedings 2005, 589–593.
Patrick, Jon, Sabbagh, Mojtaba, Jain, Suvir & Zheng, Haifeng. 2010. Spelling correction in clinical notes with emphasis on first suggestion accuracy. Proceedings of the 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM), 2–8.
Pyper, Cecilia, Amery, Justin, Watson, Marion & Crook, Claire. 2004. Patients’ experiences when accessing their on-line electronic patient records in primary care. The British Journal of General Practice 54, 3843.
Skeppstedt, Maria, Kvist, Maria & Dalianis, Hercules. 2012. Rule-based entity recognition and coverage of SNOMED CT in Swedish clinical text. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 1250–1257.
Smedby, Björn. 1991. Medicinens Språk: språket i sjukdomsklassifikationen – mer konsekvent försvenskning eftersträvas [Language of medicine: The language of diagnose classification – more consistent Swedification sought]. Läkartidningen 88, 1519–1520.
Smith, Christian, Danielsson, Henrik & Jönsson, Arne. 2012. A good space: Lexical predictors in word space evaluation. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2530–2535.
Tomanek, Katrin, Wermter, Joachim & Hahn, Udo. 2007. A reappraisal of sentence and token splitting for life sciences documents. Proceedings of 12th World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, 524–528.
Xu, Hua, Stetson, Peter & Friedman, Carol. 2007. A study of abbreviations in clinical notes. AMIA Annual Symposium Proceedings 2007, 821–825.


Professional language in Swedish clinical text: Linguistic characterization and comparative studies

  • Kelly Smith (a1), Beata Megyesi (a2), Sumithra Velupillai (a3) and Maria Kvist (a4)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed