Transfer learning for Turkish named entity recognition on noisy text

Emre Kağan Akkaya; Burcu Can

doi:10.1017/S1351324919000627

Transfer learning for Turkish named entity recognition on noisy text

Published online by Cambridge University Press: 28 January 2020

Emre Kağan Akkaya

and

Burcu Can

Show author details

Emre Kağan Akkaya: Affiliation:
Department of Computer Engineering, Hacettepe University, Turkey
Burcu Can*: Affiliation:
Department of Computer Engineering, Hacettepe University, Turkey
*: *Corresponding author. E-mail: burcucan@cs.hacettepe.edu.tr

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this article, we investigate using deep neural networks with different word representation techniques for named entity recognition (NER) on Turkish noisy text. We argue that valuable latent features for NER can, in fact, be learned without using any hand-crafted features and/or domain-specific resources such as gazetteers and lexicons. In this regard, we utilize character-level, character n-gram-level, morpheme-level, and orthographic character-level word representations. Since noisy data with NER annotation are scarce for Turkish, we introduce a transfer learning model in order to learn infrequent entity types as an extension to the Bi-LSTM-CRF architecture by incorporating an additional conditional random field (CRF) layer that is trained on a larger (but formal) text and a noisy text simultaneously. This allows us to learn from both formal and informal/noisy text, thus improving the performance of our model further for rarely seen entity types. We experimented on Turkish as a morphologically rich language and English as a relatively morphologically poor language. We obtained an entity-level F1 score of 67.39% on Turkish noisy data and 45.30% on English noisy data, which outperforms the current state-of-art models on noisy text. The English scores are lower compared to Turkish scores because of the intense sparsity in the data introduced by the user writing styles. The results prove that using subword information significantly contributes to learning latent features for morphologically rich languages.

Keywords

Named entity recognition Transfer learning Recurrent neural networks Low-resource language Noisy text

Type: Article
Information: Natural Language Engineering , Volume 27 , Issue 1 , January 2021 , pp. 35 - 64

DOI: https://doi.org/10.1017/S1351324919000627 [Opens in a new window]
Copyright: © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aguilar, G., Maharjan, S., Monroy, A.P.L. and Solorio, T. (2017). A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-Generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 148–153.CrossRef Google Scholar

Bikel, D.M., Miller, S., Schwartz, R. and Weischedel, R. (1997). Nymble: A high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC’97. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 194–201.CrossRef Google Scholar

Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.Google Scholar

Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146.CrossRef Google Scholar

Cao, K. and Rei, M. (2016). A joint model for word embedding and word morphology. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, pp. 18–26.CrossRef Google Scholar

Çelikkaya, G., Torunoğlu, D. and Eryiğit, G. (2013). Named entity recognition on real data: A preliminary investigation for Turkish. In 2013 7th International Conference on Application of Information and Communication Technologies (AICT). IEEE, pp. 1–5.CrossRef Google Scholar

Chiu, J.P. and Nichols, E. (2015). Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308.Google Scholar

Cotterell, R. and Duh, K. (2017). Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Taipei, Taiwan. Asian Federation of Natural Language Processing, pp. 91–96.Google Scholar

Derczynski, L., Nichols, E., van Erp, M. and Limsopatham, N. (2017). Results of the WNUT2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 140–147.Google Scholar

Eken, B. and Tantuğ, C. (2015). Recognizing named entities in Turkish tweets. In Proceedings of the Fourth International Conference on Software Engineering and Applications, Dubai, UAE.CrossRef Google Scholar

Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Teh, Y.W. and Titterington, M. (eds), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, volume 9 of Proceedings of Machine Learning Research. PMLR, pp. 249–256.Google Scholar

Godin, F., Vandersmissen, B., De Neve, W. and Van de Walle, R. (2015). Multimedia lab @ ACL WNUT NER shared task: Named entity recognition for Twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-generated Text. Association for Computational Linguistics, pp. 146–153.CrossRef Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780.Google Scholar PubMed

Huang, Z., Xu, W. and Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.Google Scholar

Jansson, P. and Liu, S. (2017). Distributed representation, LDA topic modelling and deep learning for emerging named entity recognition from social media. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 154–159.CrossRef Google Scholar

Küçük, D. and Steinberger, R. (2014). Experiments to improve named entity recognition on Turkish tweets. arXiv preprint arXiv:1410.8668.Google Scholar

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.Google Scholar

Landauer, T.K., Foltz, P.W. and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284.CrossRef Google Scholar

Limsopatham, N. and Collier, N. H. (2016). Bidirectional LSTM for named entity recognition in Twitter messages. In Proceedings of the 2nd Workshop on Noisy User-generated Text, Osaka, Japan, pp. 145–152.Google Scholar

Lin, B.Y., Xu, F., Luo, Z. and Zhu, K. (2017). Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, pp. 160–165.Google Scholar

Ma, X. and Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google Scholar

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157.CrossRef Google Scholar PubMed

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar

Nair, V. and Hinton, G.E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10. USA: Omnipress, pp. 807–814.Google Scholar

Okur, E., Demir, H. and Özgür, A. (2016). Named entity recognition on Twitter for Turkish using semi-supervised learning with word embeddings. In LREC.Google Scholar

Pagliardini, M., Gupta, P. and Jaggi, M. (2017). Unsupervised learning of sentence embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507.Google Scholar

Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 1532–1543.Google Scholar

Petasis, G., Vichot, F., Wolinski, F., Paliouras, G., Karkaletsis, V. and Spyropoulos, C.D. (2001). Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL’01, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 426–433.CrossRef Google Scholar

Reimers, N., Eckle-Kohler, J., Schnober, C., Kim, J. and Gurevych, I. (2014). Germeval-2014: Nested named entity recognition with neural networks. In Proceedings of the KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany.Google Scholar

Riedl, M. and Padó, S. (2018). A named entity recognition shootout for German. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp. 120–125.CrossRef Google Scholar

Sak, H., Güngör, T. and Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Advances in Natural Language Processing, pp. 417–427. Springer.Google Scholar

Sak, H., Güngör, T. and Saraçlar, M. (2011). Resources for Turkish morphological processing. Language Resources and Evaluation 45(2), 249–261.CrossRef Google Scholar

Şeker, G.A. and Eryiğit, G. (2017). Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content 1. Semantic Web 8(5), 625–642.CrossRef Google Scholar

Sezer, B., Sezer, T. and Ünivesitesi, M. (2013). TS Corpus: Herkes için Türkçe derlem. In Proceedings 27th National Linguistics Conference, May, pp. 3–4.Google Scholar

Sikdar, U.K. and Gambäck, B. (2017). A feature-based ensemble approach to recognition of emerging and rare named entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 177–181.Google Scholar

Suzuki, J. and Isozaki, H. (2008). Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proceedings of ACL-08: HLT, pp. 665–673. Association for Computational Linguistics.Google Scholar

Torunoğlu, D. and Eryiğit, G. (2014). A cascaded approach for social media text normalization of Turkish. In Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), Gothenburg, Sweden. Association for Computational Linguistics, pp. 62–70.Google Scholar

Tür, G., Hakkani-Tür, D. and Oflazer, K. (2003). A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181–210.CrossRef Google Scholar

Üstün, A. and Can, B. (2016). Unsupervised morphological segmentation using neural word embeddings. In Král, P. and Martín-Vide, C. (eds), Statistical Language and Speech Processing, pp. 43–53. Cham: Springer International Publishing.Google Scholar

Üstün, A., Kurfal, M. and Can, B. (2018). Characters or morphemes: How to represent words? In Proceedings of The Third Workshop on Representation Learning for NLP, Melbourne, Australia. Association for Computational Linguistics, pp. 144–153.Google Scholar

von Däniken, P. and Cieliebak, M. (2017). Transfer learning and sentence level features for named entity recognition on tweets. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 166–171.Google Scholar

Williams, J. and Santia, G. (2017). Context-sensitive recognition for emerging and rare entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark. Association for Computational Linguistics, pp. 172–176.CrossRef Google Scholar

Wu, Y., Zhao, J. and Xu, B. (2003). Chinese named entity recognition combining statistical model with human knowledge. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan. Association for Computational Linguistics, pp. 65–72.CrossRef Google Scholar

Yang, Z., Salakhutdinov, R. and Cohen, W.W. (2017). Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345.Google Scholar

Yin, Z. and Shen, Y. (2018). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18. USA: Curran Associates Inc, pp. 895–906.Google Scholar

Kağan Akkaya and Can Supplementary Materials

Kağan Akkaya and Can Supplementary Materials 1

File 35.2 KB

Article contents

Transfer learning for Turkish named entity recognition on noisy text

Abstract

Keywords

Access options

References

Kağan Akkaya and Can Supplementary Materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests