Hostname: page-component-848d4c4894-ttngx Total loading time: 0 Render date: 2024-05-05T19:34:01.634Z Has data issue: false hasContentIssue false

Enhancing deep neural networks with morphological information

Published online by Cambridge University Press:  21 February 2022

Matej Klemen*
Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
Luka Krsnik
Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
Marko Robnik-Šikonja
Affiliation:
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia
*
*Corresponding author. E-mail: matej.klemen@fri.uni-lj.si

Abstract

Deep learning approaches are superior in natural language processing due to their ability to extract informative features and patterns from languages. The two most successful neural architectures are LSTM and transformers, used in large pretrained language models such as BERT. While cross-lingual approaches are on the rise, most current natural language processing techniques are designed and applied to English, and less-resourced languages are lagging behind. In morphologically rich languages, information is conveyed through morphology, for example, through affixes modifying stems of words. The existing neural approaches do not explicitly use the information on word morphology. We analyse the effect of adding morphological features to LSTM and BERT models. As a testbed, we use three tasks available in many less-resourced languages: named entity recognition (NER), dependency parsing (DP) and comment filtering (CF). We construct baselines involving LSTM and BERT models, which we adjust by adding additional input in the form of part of speech (POS) tags and universal features. We compare the models across several languages from different language families. Our results suggest that adding morphological features has mixed effects depending on the quality of features and the task. The features improve the performance of LSTM-based models on the NER and DP tasks, while they do not benefit the performance on the CF task. For BERT-based models, the added morphological features only improve the performance on DP when they are of high quality (i.e., manually checked) while not showing any practical improvement when they are predicted. Even for high-quality features, the improvements are less pronounced in language-specific BERT variants compared to massively multilingual BERT models. As in NER and CF datasets manually checked features are not available, we only experiment with predicted features and find that they do not cause any practical improvement in performance.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, M. and Gómez-Rodrguez, C. (2020). On the frailty of universal POS tags for neural UD parsers. In Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 6996. doi: 10.18653/v1/2020.conll-1.6.CrossRefGoogle Scholar
Arkhipov, M., Trofimova, M., Kuratov, Y. and Sorokin, A. (2019). Tuning multilingual transformers for language-specific named entity recognition. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 8993. doi: 10.18653/v1/W19-3712.CrossRefGoogle Scholar
Ballesteros, M., Dyer, C. and Smith, N.A. (2015). Improved transition-based parsing by modeling characters instead of words with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, September 2015. Association for Computational Linguistics, pp. 349–359. doi: 10.18653/v1/D15-1041.CrossRefGoogle Scholar
Benajiba, Y., Rosso, P. and Benedí Ruiz, J.M. (2007). ANERsys: An Arabic named entity recognition system based on maximum entropy. In Gelbukh, A. (ed), Computational Linguistics and Intelligent Text Processing, pp. 143153.CrossRefGoogle Scholar
Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146.Google Scholar
Chen, D. and Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740750.CrossRefGoogle Scholar
Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pp. 160167.CrossRefGoogle Scholar
Conneau, A., Kruszewski, G., Lample, G., Barrault, L. and Baroni, M. (2018).What you can cram into a single &!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 21262136.CrossRefGoogle Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 41714186.Google Scholar
dos Santos, C. and Guimarães, V. (2015). Boosting named entity recognition with neural character embeddings. In Proceedings of the Fifth Named Entity Workshop, pp. 2533. doi: 10.18653/v1/W15-3904.CrossRefGoogle Scholar
Dozat, T. and Manning, C.D. (2016). Deep biaffine attention for neural dependency parsing. In Proceedings on International Conference on Learning Representation.Google Scholar
Dozat, T., Qi, P. and Manning, C.D. (2017). Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 2030. doi: 10.18653/v1/K17-3002.CrossRefGoogle Scholar
Edmiston, D. (2020). A systematic analysis of morphological content in BERT models for multiple languages. arXiv:2004.03032.Google Scholar
Elazar, Y., Ravfogel, S., Jacovi, A. and Goldberg, Y. (2021). Amnesic probing: Behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Ling. 9, 160–175.Google Scholar
Evkoski, B., Mozetič, I., Ljubešić, N. and Novak, P. K. (2021). Community evolution in retweet networks. PLOS ONE 16 (9), 1–21. doi: 10.1371/journal.pone.0256175.CrossRefGoogle ScholarPubMed
Farahani, M., Gharachorloo, M., Farahani, M. and Manthouri, M. (2021). ParsBERT: Transformer-based model for Persian language understanding. Neural Process. Lett. 53, 117.CrossRefGoogle Scholar
Fortuna, P. and Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51 (4), Article 85. Available at https://dl.acm.org/toc/csur/2019/51/4 Google Scholar
Gao, L. and Huang, R. (2017). Detecting online hate speech using context aware models. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 260266.CrossRefGoogle Scholar
Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G. and Plagianakos, V.P. (2018). Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 16.CrossRefGoogle Scholar
Grünewald, S., Friedrich, A. and Kuhn, J. (2021). Applying Occam’s razor to transformer-based dependency parsing: What works, what doesn’t, and what is really necessary. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), pp. 131144. doi: 10.18653/v1/2021.iwpt-1.13.CrossRefGoogle Scholar
Güngör, O., Güngör, T. and Üsküdarli, S. (2019). The effect of morphology in named entity recognition with sequence tagging. Nat. Lang. Eng. 25 (1), 147–169. doi: 10.1017/S1351324918000281.CrossRefGoogle Scholar
Güngör, O., Yldz, E., Üsküdarli, S. and Güngör, T. (2017). Morphological embeddings for named entity recognition in morphologically rich languages. arXiv preprint arXiv:1706.00506.Google Scholar
Hajič, J. and Zeman, D. (eds). (2017). Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics.Google Scholar
Han, J., Wu, S. and Liu, X. (2019). jhan014 at SemEval-2019 task 6: Identifying and categorizing offensive language in social media. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 652656.CrossRefGoogle Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9 (8), 1735–1780.CrossRefGoogle ScholarPubMed
Huang, Z., Xu, W. and Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. ArXiv, abs/1508.01991.Google Scholar
Jawahar, G., Sagot, B. and Seddah, D. (2019). What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 36513657.Google Scholar
Ji, T., Wu, Y. and Lan, M. (2019). Graph-based dependency parsing with graph neural networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 24752485.CrossRefGoogle Scholar
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L. and Levy, O. (2020). SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Ling. 8, 64–77.Google Scholar
Jurafsky, D. and Martin, J.H. (2009). Speech and Language Processing. 2nd Edn. USA: Prentice-Hall, Inc.Google Scholar
Kanji, G.K. (2006). 100 Statistical Tests. London: Sage.CrossRefGoogle Scholar
Kapočiūtė-Dzikienė, J., Nivre, J. and Krupavičius, A. (2013). Lithuanian dependency parsing with rich morphological features. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 1221.Google Scholar
Khallash, M., Hadian, A. and Minaei-Bidgoli, B. (2013). An empirical study on the effect of morphological and lexical features in Persian dependency parsing. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 97107.Google Scholar
Kiperwasser, E. and Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 313–327.Google Scholar
Kondratyuk, D. and Straka, M. (2019). 75 languages, 1 model: Parsing universal dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 27792795.CrossRefGoogle Scholar
Krek, S., Dobrovoljc, K., Erjavec, T., Može, S., Ledinek, N., Holz, N., Zupan, K., Gantar, P., Kuzman, T., Čibej, J., Holdt, Š. A., Kavčič, T., Škrjanec, I., Marko, D., Jezeršek, L. and Zajc, A. (2019). Training corpus ssj500k 2.2. Available at http://hdl.handle.net/11356/1210. Slovenian language resource repository CLARIN.SI.Google Scholar
Kulmizev, A., de Lhoneux, M., Gontrum, J., Fano, E. and Nivre, J. (2019). Deep contextualized word embeddings in transition-based and graph-based dependency parsing - a tale of two parsers revisited. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 27552768.CrossRefGoogle Scholar
Kuratov, Y. and Arkhipov, M. (2019). Adaptation of deep bidirectional multilingual transformers for Russian language. In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019“.Google Scholar
Kuru, O., Can, O.A. and Yuret, D. (2016). CharNER: Character-level named entity recognition. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 911921.Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. and Dyer, C. (2016). Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260270. doi: 10.18653/v1/N16-1030.CrossRefGoogle Scholar
Levow, G.-A. (2006). The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108117.Google Scholar
Lhoneux, M.D., Shao, Y., Basirat, A., Kiperwasser, E., Stymne, S., Goldberg, Y. and Nivre, J. (2017). From raw text to Universal Dependencies - look, no tags! In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 207217. doi: 10.18653/v1/K17-3022.CrossRefGoogle Scholar
Li, Z., Cai, J., He, S. and Zhao, H. (2018). Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 32033214.Google Scholar
Lim, K., Lee, J.Y., Carbonell, J. and Poibeau, T. (2020). Semi-supervised learning on meta structure: Multi-task tagging and parsing in low-resource scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 83448351. doi: 10.1609/aaai.v34i05.6351.CrossRefGoogle Scholar
Lim, K., Park, C., Lee, C. and Poibeau, T. (2018). SEx BiST: A multi-source trainable parser with deep contextualized lexical representations. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 143152. doi: 10.18653/v1/K18-2014.CrossRefGoogle Scholar
Lin, Y.L., Tan, Y.C. and Frank, R. (2019). Open Sesame: Getting inside BERT’s linguistic knowledgee. In Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP.CrossRefGoogle Scholar
Ljubešić, N., Agić, Ž., Klubička, F., Batanović, V. and Erjavec, T. (2018). Training corpus hr500k 1.0. Available at http://hdl.handle.net/11356/1183. Slovenian language resource repository CLARIN.SI.Google Scholar
Malmasi, S. and Zampieri, M. (2017). Detecting hate speech in social media. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 467472.CrossRefGoogle Scholar
Marton, Y., Habash, N. and Rambow, O. (2010). Improving Arabic dependency parsing with lexical and inflectional morphological features. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pp. 1321.Google Scholar
McDonald, R., Pereira, F., Ribarov, K. and Hajič, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 523530.CrossRefGoogle Scholar
Mikhailov, V., Serikov, O. and Artemova, E. (2021). Morph call: Probing morphosyntactic content of multilingual transformers. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pp. 97121. doi: 10.18653/v1/2021.sigtyp-1.10.CrossRefGoogle Scholar
Miok, K., Nguyen-Doan, D., Škrlj, B., Zaharie, D. and Robnik-Šikonja, M. (2019). Prediction uncertainty estimation for hate speech classification. In International Conference on Statistical Language and Speech Processing, pp. 286298.CrossRefGoogle Scholar
Mohseni, M. and Tebbifakhr, A. (2019). MorphoBERT: A Persian NER system with BERT and morphological analysis. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers, pp. 2330.Google Scholar
Moon, J., Cho, W.I. and Lee, J. (2020). BEEP! Korean corpus of online news comments for toxic speech detection. In Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, pp. 2531.CrossRefGoogle Scholar
Nemeskey, D.M. (2021). Introducing huBERT. In XVII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY2021).Google Scholar
Nguyen, D.Q. and Verspoor, K. An improved neural network model for joint POS tagging and dependency parsing. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 2018. Association for Computational Linguistics, pp. 8191. doi: 10.18653/v1/K18-2008.CrossRefGoogle Scholar
Nguyen, L.T. and Nguyen, D.Q. (2021). PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, pp. 17. doi: 10.18653/v1/2021.naacl-demos.1.CrossRefGoogle Scholar
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the Eighth International Conference on Parsing Technologies, pp. 149160.Google Scholar
Nivre, J., Abrams, M., Agić, Ž., Ahrenberg, L., Aleksandravičiūtė, G., Antonsen, L., Aplonova, K., Aranzabe, M., Arutie, G., Asahara, M., et al. (2020). Universal Dependencies 2.6. Available at http://hdl.handle.net/11234/1-2988. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.Google Scholar
Özateş, Ş.B., Özgür, A., Güngör, T. and Öztürk, B. (2018). A morphology-based representation model for LSTM-based dependency parsing of agglutinative languages. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, pp. 238247. doi: 10.18653/v1/K18-2024.CrossRefGoogle Scholar
Paikens, P., Auziņa, I., Garkaje, G. and Paegle, M. (2012). Towards named entity annotation of Latvian national library corpus. Front. Artif. Intell. Appl. 247, 0 169–175. doi: 10.3233/978-1-61499-133-5-169.Google Scholar
Pei, W., Ge, T. and Chang, B. (2015). An effective neural network model for graph-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 313322.CrossRefGoogle Scholar
Pires, T., Schlinger, E. and Garrette, D. (2019). How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 49965001.Google Scholar
Qi, P., Zhang, Y., Zhang, Y., Bolton, J. and Manning, C.D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.Google Scholar
Rogers, A., Kovaleva, O. and Rumshisky, A. (2020). A primer in BERTology: What we know about how BERT works. Trans. Assoc. Comput. Ling. 8, 842–866.Google Scholar
Ruokolainen, T., Kauppinen, P., Silfverberg, M. and Lindén, K. (2019). A Finnish news corpus for named entity recognition. Lang. Resour. Eval. 54. doi: 10.1007/s10579-019-09471-7.CrossRefGoogle Scholar
Safaya, A., Abdullatif, M. and Yuret, D. (2020). KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 2054–2059.CrossRefGoogle Scholar
Scheffler, T., Haegert, E., Pornavalai, S. and Sasse, M.L. (2018). Feature explorations for hate speech classification. In 14th Conference on Natural Language Processing KONVENS 2018, vol. 6, p. 8.Google Scholar
Seddah, D., Koebler, S. and Tsarfaty, R. (eds). (2010). Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages.Google Scholar
Seeker, W. and Kuhn, J. (2011). On the role of explicit morphological feature representation in syntactic dependency parsing for German. In Proceedings of the 12th International Conference on Parsing Technologies, pp. 5862.Google Scholar
Seker, A., Bandel, E., Bareket, D., Brusilovsky, I., Greenfeld, R.S. and Tsarfaty, R. (2021). AlephBERT: A Hebrew large pre-trained language model to start-off your Hebrew NLP application with. ArXiv 2104.04052.Google Scholar
Shtovba, S., Shtovba, O. and Petrychko, M. (2019). Detection of social network toxic comments with usage of syntactic dependencies in the sentences. In Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems, pp. 313323.CrossRefGoogle Scholar
Simeonova, L., Simov, K., Osenova, P. and Nakov, P. (2019). A morpho-syntactically informed LSTM-CRF model for named entity recognition. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 11041113.CrossRefGoogle Scholar
Starostin, A., Bocharov, V.V., Alexeeva, S., Bodrova, A.A., Chuchunkov, A., Dzhumaev, Sh.Sh., Efimenko, I., Granovsky, D.V., Khoroshevsky, V.F., Krylova, I.V., Nikolaeva, M., Smurov, I. and Toldova, S. (2016). FactRuEval 2016: Evaluation of named entity recognition and fact extraction systems for Russian. In Annual International Conference “Dialogue“.Google Scholar
Straka, M. (2018). UDPipe 2.0 prototype at CoNLL 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 197207. doi: 10.18653/v1/K18-2020.CrossRefGoogle Scholar
Straková, J., Straka, M. and Hajič, J. (2016). Neural networks for featureless named entity recognition in Czech. In Proceedings of Text, Speech, and Dialogue, pp. 173181.CrossRefGoogle Scholar
Student. (1908). The probable error of a mean. Biometrika, 125.Google Scholar
Taghizadeh, N., Borhanifard, Z., Pour, M.G., Farhoodi, M., Mahmoudi, M., Azimzadeh, M. and Faili, H. (2019). NSURL-2019 task 7: Named entity recognition for Farsi. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers, pp. 915.Google Scholar
Tanvir, H., Kittask, C., Eiche, S. and Sirts, K. (2021). EstBERT: A pretrained language-specific BERT for Estonian. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 1119.Google Scholar
Tenney, I., Das, D. and Pavlick, E. (2019). BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 45934601.CrossRefGoogle Scholar
Tjong Kim Sang, E.F. and De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142147.CrossRefGoogle Scholar
Tkachenko, A., Petmanson, T. and Laur, S. (2013). Named entity recognition in Estonian. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, pp. 7883.Google Scholar
Ulčar, M. and Robnik-Šikonja, M. (2020). FinEst BERT and CroSloEngual BERT: Less is more in multilingual models. In Proceedings of Text, Speech, and Dialogue TSD 2020. doi: 10.1007/978-3-030-58323-1_11.CrossRefGoogle Scholar
Van Hee, C., Lefever, E., Verhoeven, B., Mennes, J., Desmet, B., De Pauw, G., Daelemans, W. and Hoste, V. (2015). Detection and fine-grained classification of cyberbullying events. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 672680.Google Scholar
Vania, C., Grivas, A. and Lopez, A. (2018). What do character-level models learn about morphology? The case of dependency parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 25732583. doi: 10.18653/v1/D18-1278.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 60006010.Google Scholar
Virtanen, A., Kanerva, J., Ilo, R., Luoma, J., Luotolahti, J., Salakoski, T., Ginter, F. and Pyysalo, S. (2019). Multilingual is not enough: BERT for Finnish. ArXiv 1912.07076.Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang, K.J. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 37 (3), 328339. doi: 10.1109/29.21701.CrossRefGoogle Scholar
Wilcoxon, F., Katti, S.K. and Wilcox, R.A. (1970). Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected Tables Math. Stat. 1, 171–259.Google Scholar
Wulczyn, E., Thain, N. and Dixon, L. (2017). Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web, pp. 13911399. doi: 10.1145/3038912.3052591.CrossRefGoogle Scholar
Yamada, H. and Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In Proceedings of the Eighth International Conference on Parsing Technologies, pp. 195206.Google Scholar
Yang, Z., Salakhutdinov, R. and Cohen, W. (2016). Multi-task cross-lingual sequence tagging from scratch. ArXiv:1603.06270.Google Scholar
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N. and Kumar, R. (2019). SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 7586.CrossRefGoogle Scholar
Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z. and Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). In Proceedings of SemEval.CrossRefGoogle Scholar
Zhou, H., Zhang, Y., Li, Z. and Zhang, M. (2020a). Is POS tagging necessary or even helpful for neural dependency parsing? In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings, Part I, pp. 179191. doi: 10.1007/978-3-030-60450-9_15.CrossRefGoogle Scholar
Zhou, J., Zhang, Z., Zhao, H. and Zhang, S. (2020b). LIMIT-BERT: Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 44504461. doi: 10.18653/v1/2020.findings-emnlp.399.CrossRefGoogle Scholar