Enhancing deep neural networks with morphological information

Matej Klemen; Luka Krsnik; Marko Robnik-Šikonja

doi:10.1017/S1351324922000080

Enhancing deep neural networks with morphological information

Published online by Cambridge University Press: 21 February 2022

Matej Klemen

Luka Krsnik and

Marko Robnik-Šikonja

Show author details

Matej Klemen*: Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
Luka Krsnik: Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
Marko Robnik-Šikonja: Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
*: *Corresponding author. E-mail: matej.klemen@fri.uni-lj.si

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Deep learning approaches are superior in natural language processing due to their ability to extract informative features and patterns from languages. The two most successful neural architectures are LSTM and transformers, used in large pretrained language models such as BERT. While cross-lingual approaches are on the rise, most current natural language processing techniques are designed and applied to English, and less-resourced languages are lagging behind. In morphologically rich languages, information is conveyed through morphology, for example, through affixes modifying stems of words. The existing neural approaches do not explicitly use the information on word morphology. We analyse the effect of adding morphological features to LSTM and BERT models. As a testbed, we use three tasks available in many less-resourced languages: named entity recognition (NER), dependency parsing (DP) and comment filtering (CF). We construct baselines involving LSTM and BERT models, which we adjust by adding additional input in the form of part of speech (POS) tags and universal features. We compare the models across several languages from different language families. Our results suggest that adding morphological features has mixed effects depending on the quality of features and the task. The features improve the performance of LSTM-based models on the NER and DP tasks, while they do not benefit the performance on the CF task. For BERT-based models, the added morphological features only improve the performance on DP when they are of high quality (i.e., manually checked) while not showing any practical improvement when they are predicted. Even for high-quality features, the improvements are less pronounced in language-specific BERT variants compared to massively multilingual BERT models. As in NER and CF datasets manually checked features are not available, we only experiment with predicted features and find that they do not cause any practical improvement in performance.

Keywords

Deep learning Natural language processing Morphologically rich languages Transformers Morphological additions

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 2 , March 2023 , pp. 360 - 385

DOI: https://doi.org/10.1017/S1351324922000080 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, M. and Gómez-Rodrguez, C. (2020). On the frailty of universal POS tags for neural UD parsers. In Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 69–96. doi: 10.18653/v1/2020.conll-1.6.CrossRef Google Scholar

Arkhipov, M., Trofimova, M., Kuratov, Y. and Sorokin, A. (2019). Tuning multilingual transformers for language-specific named entity recognition. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 89–93. doi: 10.18653/v1/W19-3712.CrossRef Google Scholar

Ballesteros, M., Dyer, C. and Smith, N.A. (2015). Improved transition-based parsing by modeling characters instead of words with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, September 2015. Association for Computational Linguistics, pp. 349–359. doi: 10.18653/v1/D15-1041.CrossRef Google Scholar

Benajiba, Y., Rosso, P. and Benedí Ruiz, J.M. (2007). ANERsys: An Arabic named entity recognition system based on maximum entropy. In Gelbukh, A. (ed), Computational Linguistics and Intelligent Text Processing, pp. 143–153.CrossRef Google Scholar

Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146.Google Scholar

Chen, D. and Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750.CrossRef Google Scholar

Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pp. 160–167.CrossRef Google Scholar

Conneau, A., Kruszewski, G., Lample, G., Barrault, L. and Baroni, M. (2018).What you can cram into a single &!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2126–2136.CrossRef Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186.Google Scholar

dos Santos, C. and Guimarães, V. (2015). Boosting named entity recognition with neural character embeddings. In Proceedings of the Fifth Named Entity Workshop, pp. 25–33. doi: 10.18653/v1/W15-3904.CrossRef Google Scholar

Dozat, T. and Manning, C.D. (2016). Deep biaffine attention for neural dependency parsing. In Proceedings on International Conference on Learning Representation.Google Scholar

Dozat, T., Qi, P. and Manning, C.D. (2017). Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30. doi: 10.18653/v1/K17-3002.CrossRef Google Scholar

Edmiston, D. (2020). A systematic analysis of morphological content in BERT models for multiple languages. arXiv:2004.03032.Google Scholar

Elazar, Y., Ravfogel, S., Jacovi, A. and Goldberg, Y. (2021). Amnesic probing: Behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Ling. 9, 160–175.Google Scholar

Evkoski, B., Mozetič, I., Ljubešić, N. and Novak, P. K. (2021). Community evolution in retweet networks. PLOS ONE 16 (9), 1–21. doi: 10.1371/journal.pone.0256175.CrossRef Google Scholar PubMed

Farahani, M., Gharachorloo, M., Farahani, M. and Manthouri, M. (2021). ParsBERT: Transformer-based model for Persian language understanding. Neural Process. Lett. 53, 1–17.CrossRef Google Scholar

Fortuna, P. and Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51 (4), Article 85. Available at https://dl.acm.org/toc/csur/2019/51/4 Google Scholar

Gao, L. and Huang, R. (2017). Detecting online hate speech using context aware models. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 260–266.CrossRef Google Scholar

Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G. and Plagianakos, V.P. (2018). Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 1–6.CrossRef Google Scholar

Grünewald, S., Friedrich, A. and Kuhn, J. (2021). Applying Occam’s razor to transformer-based dependency parsing: What works, what doesn’t, and what is really necessary. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), pp. 131–144. doi: 10.18653/v1/2021.iwpt-1.13.CrossRef Google Scholar

Güngör, O., Güngör, T. and Üsküdarli, S. (2019). The effect of morphology in named entity recognition with sequence tagging. Nat. Lang. Eng. 25 (1), 147–169. doi: 10.1017/S1351324918000281.CrossRef Google Scholar

Güngör, O., Yldz, E., Üsküdarli, S. and Güngör, T. (2017). Morphological embeddings for named entity recognition in morphologically rich languages. arXiv preprint arXiv:1706.00506.Google Scholar

Hajič, J. and Zeman, D. (eds). (2017). Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics.Google Scholar

Han, J., Wu, S. and Liu, X. (2019). jhan014 at SemEval-2019 task 6: Identifying and categorizing offensive language in social media. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 652–656.CrossRef Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9 (8), 1735–1780.CrossRef Google Scholar PubMed

Huang, Z., Xu, W. and Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. ArXiv, abs/1508.01991.Google Scholar

Jawahar, G., Sagot, B. and Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3651–3657.Google Scholar

Ji, T., Wu, Y. and Lan, M. (2019). Graph-based dependency parsing with graph neural networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2475–2485.CrossRef Google Scholar

Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L. and Levy, O. (2020). SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Ling. 8, 64–77.Google Scholar

Jurafsky, D. and Martin, J.H. (2009). Speech and Language Processing. 2nd Edn. USA: Prentice-Hall, Inc.Google Scholar

Kanji, G.K. (2006). 100 Statistical Tests. London: Sage.CrossRef Google Scholar

Kapočiūtė-Dzikienė, J., Nivre, J. and Krupavičius, A. (2013). Lithuanian dependency parsing with rich morphological features. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 12–21.Google Scholar

Khallash, M., Hadian, A. and Minaei-Bidgoli, B. (2013). An empirical study on the effect of morphological and lexical features in Persian dependency parsing. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 97–107.Google Scholar

Kiperwasser, E. and Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 313–327.Google Scholar

Kondratyuk, D. and Straka, M. (2019). 75 languages, 1 model: Parsing universal dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2779–2795.CrossRef Google Scholar

Krek, S., Dobrovoljc, K., Erjavec, T., Može, S., Ledinek, N., Holz, N., Zupan, K., Gantar, P., Kuzman, T., Čibej, J., Holdt, Š. A., Kavčič, T., Škrjanec, I., Marko, D., Jezeršek, L. and Zajc, A. (2019). Training corpus ssj500k 2.2. Available at http://hdl.handle.net/11356/1210. Slovenian language resource repository CLARIN.SI.Google Scholar

Kulmizev, A., de Lhoneux, M., Gontrum, J., Fano, E. and Nivre, J. (2019). Deep contextualized word embeddings in transition-based and graph-based dependency parsing - a tale of two parsers revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2755–2768.CrossRef Google Scholar

Kuratov, Y. and Arkhipov, M. (2019). Adaptation of deep bidirectional multilingual transformers for Russian language. In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019“.Google Scholar

Kuru, O., Can, O.A. and Yuret, D. (2016). CharNER: Character-level named entity recognition. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 911–921.Google Scholar

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270. doi: 10.18653/v1/N16-1030.CrossRef Google Scholar

Levow, G.-A. (2006). The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117.Google Scholar

Lhoneux, M.D., Shao, Y., Basirat, A., Kiperwasser, E., Stymne, S., Goldberg, Y. and Nivre, J. (2017). From raw text to Universal Dependencies - look, no tags! In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 207–217. doi: 10.18653/v1/K17-3022.CrossRef Google Scholar

Li, Z., Cai, J., He, S. and Zhao, H. (2018). Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 3203–3214.Google Scholar

Lim, K., Lee, J.Y., Carbonell, J. and Poibeau, T. (2020). Semi-supervised learning on meta structure: Multi-task tagging and parsing in low-resource scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8344–8351. doi: 10.1609/aaai.v34i05.6351.CrossRef Google Scholar

Lim, K., Park, C., Lee, C. and Poibeau, T. (2018). SEx BiST: A multi-source trainable parser with deep contextualized lexical representations. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 143–152. doi: 10.18653/v1/K18-2014.CrossRef Google Scholar

Lin, Y.L., Tan, Y.C. and Frank, R. (2019). Open Sesame: Getting inside BERT’s linguistic knowledgee. In Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP.CrossRef Google Scholar

Ljubešić, N., Agić, Ž., Klubička, F., Batanović, V. and Erjavec, T. (2018). Training corpus hr500k 1.0. Available at http://hdl.handle.net/11356/1183. Slovenian language resource repository CLARIN.SI.Google Scholar

Malmasi, S. and Zampieri, M. (2017). Detecting hate speech in social media. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 467–472.CrossRef Google Scholar

Marton, Y., Habash, N. and Rambow, O. (2010). Improving Arabic dependency parsing with lexical and inflectional morphological features. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 13–21.Google Scholar

McDonald, R., Pereira, F., Ribarov, K. and Hajič, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 523–530.CrossRef Google Scholar

Mikhailov, V., Serikov, O. and Artemova, E. (2021). Morph call: Probing morphosyntactic content of multilingual transformers. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pp. 97–121. doi: 10.18653/v1/2021.sigtyp-1.10.CrossRef Google Scholar

Miok, K., Nguyen-Doan, D., Škrlj, B., Zaharie, D. and Robnik-Šikonja, M. (2019). Prediction uncertainty estimation for hate speech classification. In International Conference on Statistical Language and Speech Processing, pp. 286–298.CrossRef Google Scholar

Mohseni, M. and Tebbifakhr, A. (2019). MorphoBERT: A Persian NER system with BERT and morphological analysis. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers, pp. 23–30.Google Scholar

Moon, J., Cho, W.I. and Lee, J. (2020). BEEP! Korean corpus of online news comments for toxic speech detection. In Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, pp. 25–31.CrossRef Google Scholar

Nemeskey, D.M. (2021). Introducing huBERT. In XVII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY2021).Google Scholar

Nguyen, D.Q. and Verspoor, K. An improved neural network model for joint POS tagging and dependency parsing. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 2018. Association for Computational Linguistics, pp. 81–91. doi: 10.18653/v1/K18-2008.CrossRef Google Scholar

Nguyen, L.T. and Nguyen, D.Q. (2021). PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, pp. 1–7. doi: 10.18653/v1/2021.naacl-demos.1.CrossRef Google Scholar

Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the Eighth International Conference on Parsing Technologies, pp. 149–160.Google Scholar

Nivre, J., Abrams, M., Agić, Ž., Ahrenberg, L., Aleksandravičiūtė, G., Antonsen, L., Aplonova, K., Aranzabe, M., Arutie, G., Asahara, M., et al. (2020). Universal Dependencies 2.6. Available at http://hdl.handle.net/11234/1-2988. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.Google Scholar

Özateş, Ş.B., Özgür, A., Güngör, T. and Öztürk, B. (2018). A morphology-based representation model for LSTM-based dependency parsing of agglutinative languages. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, pp. 238–247. doi: 10.18653/v1/K18-2024.CrossRef Google Scholar

Paikens, P., Auziņa, I., Garkaje, G. and Paegle, M. (2012). Towards named entity annotation of Latvian national library corpus. Front. Artif. Intell. Appl. 247, 0 169–175. doi: 10.3233/978-1-61499-133-5-169.Google Scholar

Pei, W., Ge, T. and Chang, B. (2015). An effective neural network model for graph-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 313–322.CrossRef Google Scholar

Pires, T., Schlinger, E. and Garrette, D. (2019). How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4996–5001.Google Scholar

Qi, P., Zhang, Y., Zhang, Y., Bolton, J. and Manning, C.D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.Google Scholar

Rogers, A., Kovaleva, O. and Rumshisky, A. (2020). A primer in BERTology: What we know about how BERT works. Trans. Assoc. Comput. Ling. 8, 842–866.Google Scholar

Ruokolainen, T., Kauppinen, P., Silfverberg, M. and Lindén, K. (2019). A Finnish news corpus for named entity recognition. Lang. Resour. Eval. 54. doi: 10.1007/s10579-019-09471-7.CrossRef Google Scholar

Safaya, A., Abdullatif, M. and Yuret, D. (2020). KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059.CrossRef Google Scholar

Scheffler, T., Haegert, E., Pornavalai, S. and Sasse, M.L. (2018). Feature explorations for hate speech classification. In 14th Conference on Natural Language Processing KONVENS 2018, vol. 6, p. 8.Google Scholar

Seddah, D., Koebler, S. and Tsarfaty, R. (eds). (2010). Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages.Google Scholar

Seeker, W. and Kuhn, J. (2011). On the role of explicit morphological feature representation in syntactic dependency parsing for German. In Proceedings of the 12th International Conference on Parsing Technologies, pp. 58–62.Google Scholar

Seker, A., Bandel, E., Bareket, D., Brusilovsky, I., Greenfeld, R.S. and Tsarfaty, R. (2021). AlephBERT: A Hebrew large pre-trained language model to start-off your Hebrew NLP application with. ArXiv 2104.04052.Google Scholar

Shtovba, S., Shtovba, O. and Petrychko, M. (2019). Detection of social network toxic comments with usage of syntactic dependencies in the sentences. In Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems, pp. 313–323.CrossRef Google Scholar

Simeonova, L., Simov, K., Osenova, P. and Nakov, P. (2019). A morpho-syntactically informed LSTM-CRF model for named entity recognition. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1104–1113.CrossRef Google Scholar

Starostin, A., Bocharov, V.V., Alexeeva, S., Bodrova, A.A., Chuchunkov, A., Dzhumaev, Sh.Sh., Efimenko, I., Granovsky, D.V., Khoroshevsky, V.F., Krylova, I.V., Nikolaeva, M., Smurov, I. and Toldova, S. (2016). FactRuEval 2016: Evaluation of named entity recognition and fact extraction systems for Russian. In Annual International Conference “Dialogue“.Google Scholar

Straka, M. (2018). UDPipe 2.0 prototype at CoNLL 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 197–207. doi: 10.18653/v1/K18-2020.CrossRef Google Scholar

Straková, J., Straka, M. and Hajič, J. (2016). Neural networks for featureless named entity recognition in Czech. In Proceedings of Text, Speech, and Dialogue, pp. 173–181.CrossRef Google Scholar

Student. (1908). The probable error of a mean. Biometrika, 1–25.Google Scholar

Taghizadeh, N., Borhanifard, Z., Pour, M.G., Farhoodi, M., Mahmoudi, M., Azimzadeh, M. and Faili, H. (2019). NSURL-2019 task 7: Named entity recognition for Farsi. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers, pp. 9–15.Google Scholar

Tanvir, H., Kittask, C., Eiche, S. and Sirts, K. (2021). EstBERT: A pretrained language-specific BERT for Estonian. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 11–19.Google Scholar

Tenney, I., Das, D. and Pavlick, E. (2019). BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4593–4601.CrossRef Google Scholar

Tjong Kim Sang, E.F. and De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147.CrossRef Google Scholar

Tkachenko, A., Petmanson, T. and Laur, S. (2013). Named entity recognition in Estonian. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pp. 78–83.Google Scholar

Ulčar, M. and Robnik-Šikonja, M. (2020). FinEst BERT and CroSloEngual BERT: Less is more in multilingual models. In Proceedings of Text, Speech, and Dialogue TSD 2020. doi: 10.1007/978-3-030-58323-1_11.CrossRef Google Scholar

Van Hee, C., Lefever, E., Verhoeven, B., Mennes, J., Desmet, B., De Pauw, G., Daelemans, W. and Hoste, V. (2015). Detection and fine-grained classification of cyberbullying events. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 672–680.Google Scholar

Vania, C., Grivas, A. and Lopez, A. (2018). What do character-level models learn about morphology? The case of dependency parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2573–2583. doi: 10.18653/v1/D18-1278.CrossRef Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010.Google Scholar

Virtanen, A., Kanerva, J., Ilo, R., Luoma, J., Luotolahti, J., Salakoski, T., Ginter, F. and Pyysalo, S. (2019). Multilingual is not enough: BERT for Finnish. ArXiv 1912.07076.Google Scholar

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang, K.J. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 37 (3), 328–339. doi: 10.1109/29.21701.CrossRef Google Scholar

Wilcoxon, F., Katti, S.K. and Wilcox, R.A. (1970). Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected Tables Math. Stat. 1, 171–259.Google Scholar

Wulczyn, E., Thain, N. and Dixon, L. (2017). Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web, pp. 1391–1399. doi: 10.1145/3038912.3052591.CrossRef Google Scholar

Yamada, H. and Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In Proceedings of the Eighth International Conference on Parsing Technologies, pp. 195–206.Google Scholar

Yang, Z., Salakhutdinov, R. and Cohen, W. (2016). Multi-task cross-lingual sequence tagging from scratch. ArXiv:1603.06270.Google Scholar

Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N. and Kumar, R. (2019). SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 75–86.CrossRef Google Scholar

Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z. and Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In Proceedings of SemEval.CrossRef Google Scholar

Zhou, H., Zhang, Y., Li, Z. and Zhang, M. (2020a). Is POS tagging necessary or even helpful for neural dependency parsing? In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings, Part I, pp. 179–191. doi: 10.1007/978-3-030-60450-9_15.CrossRef Google Scholar

Zhou, J., Zhang, Z., Zhao, H. and Zhang, S. (2020b). LIMIT-BERT: Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4450–4461. doi: 10.18653/v1/2020.findings-emnlp.399.CrossRef Google Scholar

Article contents

Enhancing deep neural networks with morphological information

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests