Skip to main content Accessibility help
×
Home

Detecting light verb constructions across languages

  • István Nagy T. (a1), Anita Rácz (a2) and Veronika Vincze (a3)

Abstract

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses, typically denoting an event or an action. They exhibit special linguistic features, especially when regarded in a multilingual context. In this paper, we focus on the automatic detection of LVCs in raw text in four different languages, namely, English, German, Spanish, and Hungarian. First, we analyze the characteristics of LVCs from a linguistic point of view based on parallel corpus data. Then, we provide a standardized (i.e., language-independent) representation of LVCs that can be used in machine learning experiments. After, we experiment on identifying LVCs in different languages: we exploit language adaptation techniques which demonstrate that data from an additional language can be successfully employed in improving the performance of supervised LVC detection for a given language. As there are several annotated corpora from several domains in the case of English and Hungarian, we also investigate the effect of simple domain adaptation techniques to reduce the gap between domains. Furthermore, we combine domain adaptation techniques with language adaptation techniques for these two languages. Our results show that both out-domain and additional language data can improve performance. We believe that our language adaptation method may have practical implications in several fields of natural language processing, especially in machine translation.

Copyright

Corresponding author

*Corresponding author. Email: vinczev@inf.u-szeged.hu

References

Hide All
Alonso Ramos, M. (2000). Verbos de apoyo, funciones léxicas y traducción automática. Revista de lexicografía 6, 155177.
Alonso Ramos, M. (2004). Las construcciones con verbo de apoyo. Madrid: Visor Libros.
Al Saied, H., Constant, M. and Candito, M. (2017). The ATILF-LLF system for parseme shared task: A transition-based verbal multiword expression tagger. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 127132.
Bannard, C. (2007). A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Proceedings of MWE 2007, Morristown, NJ, USA: Association for Computational Linguistics, pp. 18.
Belvin, R. S. (1993). The two causative haves are the two possessive haves. In Papers from the Fifth Student Conference in Linguistics, vol. 20, Cambridge: MITWPL, pp. 1934.
Berk, G., Erden, B. and Güngör, T. (2018). Deep-BGT at PARSEME shared task 2018: Bidirectional LSTM-CRF model for verbal multiword expression identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 248253.
Blanco Escoda, X. (2000). Verbos soporte y clases de predicados en español. LEA 22, 99117.
Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 8997.
Boroş, T. and Burtica, R. (2018). GBD-NER at PARSEME shared task 2018: Multi-word expression detection using bidirectional long-short-term memory networks and graph-based decoding. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 254260.
Boroş, T., Pipa, S., Barbu Mititelu, V. and Tufiş, D. (2017). A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 121126.
Bosque, I. (2001). On the weight of light verb predicates. In Zagona, K., Maléln, E. and Herschenson, J. (eds), Features and Interfaces in Romance, Amsterdam: Benjamins, pp. 2338.
Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G. and Uszkoreit, H. (2004). TIGER: Linguistic interpretation of a German corpus. Research on Language and Computation 2(4), 597620.
Buckingham, L. (2009). Las construcciones con verbo soporte en un corpus de especialidad. Frankfurt am Main – Bern – Bruxelles – New York – Wien: Peter Lang.
Bußmann, H. (2002). Lexikon der Sprachwissenschaft. Stuttgart: Alfred Kröner.
Butt, M. and Lahiri, A. (2013). Diachronic Pertinacity of Light Verbs. Lingua 135, 729.
Calzolari, N., Fillmore, C., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of LREC 2002, Las Palmas, Spain: European Language Resources Association (ELRA), pp. 1934– 1940.
Daniels, K. (1963). Substantivierungstendenzen in der deutschen Gegenwartssprache: Nominaler Ausbau des verbalen Denkkreises. Düsseldorf: Schwann.
Danlos, L. (2010). Extension de la notion de verbe support. In Nakamura, T., Laporte, E., Dister, A. and Fairon, C. (eds), Les Tables, La grammaire par le menu, Volume d’ hommage à Christian Leclère, Louvain: Presses Universitaires de Louvain, pp. 8190.
, Duden. (2006). Der Duden in 12 Bänden. Das Standardwerk zur deutschen Sprache: Duden 06. Das Aussprachewörterbuch: Unerlässlich für die richtige Aussprache. Betonung. Namen: Bd 6 (Duden Series Volume 6): Band 6. Gebundene Ausgabe, Mannheim: Bibliographisches Institut (F.A. Brockhaus).
é. Kiss, K. (2002). The Syntax of Hungarian. Cambridge: Cambridge University Press.
Ehren, R., Lichte, T. and Samih, Y. (2018). Mumpitz at PARSEME shared task 2018: A bidirectional LSTM for the identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 261267.
Fazly, A. and Stevenson, S. (2007). Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Proceedings of MWE 2007, Prague, Czech Republic: Association for Computational Linguistics, pp. 916.
Fleischer, W., Helbig, G. and Lerchner, G. (2001). Kleine Einzyklopädie. Deutsche Sprache. Frankfurt am Main – Berlin – Bruxelles – New York – Wien: Peter Lang.
Hale, K. and Keyser, S.J. (2002). Prolegomenon to a Theory of Argument Structure. Cambridge: MIT Press.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009). The WEKA data mining software: An update. SIGKDD Explorations 11(1), 1018.
Häusermann, J. (1977). Hauptprobleme der deutschen Phraseologie auf der Basis sowjetischer Forschungsergebnisse. Tübingen: M. Niemeyer.
Heine, A. (2006). Funktionsverbgefüge in System, Text und korpusbasierter (Lerner-)Lexikographie. Frankfurt am Main: Peter Lang.
Helbig, G. and Buscha, J. (2001). Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Berlin and München: Langenscheidt.
Hwang, J.D., Bhatia, A., Bonial, C., Mansouri, A., Vaidya, A., Xue, N. and Palmer, M. (2010). PropBank annotation of multilingual light verb constructions. In Proceedings of the Fourth Linguistic Annotation Workshop, Uppsala, Sweden: Association for Computational Linguistics, pp. 8290.
Kearns, K. (2002). Light verbs in English. Manuscript.
Kim, S.N. (2008). Statistical Modeling of Multiword Expressions. PhD thesis, Melbourne: University of Melbourne.
Klyueva, N., Doucet, A. and Straka, M. (2017). Neural networks for multi-word expression detection. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 6065.
Kolesnikova, O. and Gelbukh, A. (2010). Supervised machine learning for predicting the meaning of verb-noun combinations in Spanish. In Advances in Soft Computing, Berlin – Heidelberg: Springer, pp. 196207.
Krenn, B. (2008). Description of evaluation resource – German PP-verb data. In Proceedings of MWE 2008, Marrakech, Morocco: European Language Resources Association (ELRA), pp. 710.
Langer, S. (2005). A formal specification of support verb constructions. In Langer, S. and Schnorbusch, D. (eds), Semantik im Lexikon, Tübingen: Gunter Narr Verlag, pp. 179202.
de León, Leoni, J.A. (2014). Lexical-syntactic analysis model of Spanish multi-word expressions. In Nolan, B. and Periñán-Pascual, C. (eds), Language Processing and Grammars. The role of functionally oriented computational models, Amsterdam: Benjamins, pp. 3977.
Maldonado, A., Han, L., Moreau, E., Alsulaimani, A., Chowdhury, K.D., Vogel, C. and Liu, Q. (2017). Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 114120.
Marimon, M., Fisas, B., Bel, N., Arias, B., Vázquez, S., Vivaldi, J., Torner, S., Villegas, M. and Lorente, M. (2012). The IULA Treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’ 12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 19201926.
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013). Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria: Association for Computational Linguistics, pp. 9297.
Mel’čuk, I. (2005). Verbes supports sans peine. Lingvisticae Investigationes 27(2), 203217.
Mel’čuk, I. (1974). Esquisse d’un modèle linguistique du type “Sens<-> Texte”. In Problèmes actuels en psycholinguistique. Colloques inter. du CNRS, no. 206, Paris: CNRS, pp. 291317.
Mel’čuk, I., Clas, A. and Polguère, A. (1995). Introduction à lexicologie explicative et combinatoire. Louvain-la-Neuve, France: Duculot.
Meyers, A., Reeves, R. and Macleod, C. (2004). NP-External arguments: A study of argument sharing in English. In Proceedings of MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 96103.
Moreau, E., Alsulaimani, A., Maldonado, A. and Vogel, C. (2018). CRF-Seq and CRF-DepTree at PARSEME shared task 2018: Detecting verbal MWEs using sequential and dependency-based approaches. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 241247.
Nagy, T. I. and Vincze, V. (2011). Identifying verbal collocations in Wikipedia articles. In Proceedings of the 14th International Conference on Text, Speech and Dialogue, Berlin, Heidelberg: Springer-Verlag, pp. 179186.
Nagy, T. I., Vincze, V. and Berend, G. (2011). Domain-dependent identification of multiword expressions. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 622627.
Nagy, T. I., Vincze, V. and Farkas, R. (2013). Full-coverage identification of English light verb constructions. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan: Asian Federation of Natural Language Processing, pp. 329337.
Nerima, L., Foufi, V. and Wehrli, E. (2017). Parsing and MWE detection: Fips at the PARSEME shared Tas. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 5459.
Quinlan, R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
Rácz, A., Nagy, T. I. and Vincze, V. (2014). 4FX: Light verb constructions in a multilingual parallel corpus. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), Reykjavik, Iceland: European Language Resources Association (ELRA).
Ramisch, C., Cordeiro, S.R., Savary, A., Vincze, V., Barbu Mititelu, V., Bhatia, A., Buljan, M., Candito, M., Gantar, P., Giouli, V., Güngör, T., Hawwari, A., Iñurrieta, U., Kovalevskaitė, J., Krek, S., Lichte, T., Liebeskind, C., Monti, J., Parra Escartín, C., QasemiZadeh, B., Ramisch, R., Schneider, N., Stoyanova, I., Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 222240.
Real Academia Española, Asociación de Academias de la Lengua Española. (2009). Nueva Gramätica de la Lengua Española. Madrid: Espasa Libros.
Sag, I.A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of CICLing 2002, Berlin – Heidelberg – New York: Springer, pp. 115.
Sanromán Vilas, B. (2009). Towards a semantically oriented selection of the values of Oper1. The case of golpeblow’ in Spanish. In Proceedings of MTT 2009, Montreal, Canada: Université de Montréal, pp. 327337.
Savary, A., Ramisch, C., Cordeiro, S., Sangati, F., Vincze, V., QasemiZadeh, B., Candito, M., Cap, F., Giouli, V., Stoyanova, I. and Doucet, A. (2017). The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 3147.
Simkó, K.I., Kovács, V. and Vincze, V. (2017). USzeged: Identifying verbal multiword expressions with POS tagging and parsing techniques. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 4853.
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T. and Tufiş, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of LREC 2006, Genova, Italy: European Language Resources Association (ELRA), pp. 21422147.
Stevenson, S., Fazly, A. and North, R. (2004). Statistical measures of the semi-productivity of light verb constructions. In MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 18.
Stodden, R., QasemiZadeh, B. and Kallmeyer, L. (2018). TRAPACC and TRAPACCS at PARSEME shared task 2018: Neural transition tagging of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 268274.
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L. and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Twelfth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 159177.
Szarvas, G., Vincze, V., Farkas, R., Móra, G. and Gurevych, I. (2012). Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics – Special Issue on Modality and Negation 38(2), 335367.
Tan, Y.F., -Y, Kan M.. and Cui, H. (2006). Extending corpus-based identification of light verb constructions using a supervised learning framework. In Proceedings of MWE 2006, Trento, Italy: ACL, pp. 4956.
Tu, Y. and Roth, D. (2011). Learning English Light Verb Constructions: Contextual or Statistical. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 3139.
Varga, L. (2014). Verbe support et noms prédicatifs à l’accusatif du hongrois. In Kakoyianni-Doa, F. (ed), Penser le Lexique-Grammaire; Perspectives actuelles, Paris: Honoré Champion, pp. 249261.
Vincze, V. (2011). Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. PhD thesis, Szeged, Hungary: University of Szeged.
Vincze, V. (2012). Light verb constructions in the SzegedParalellFX English–Hungarian parallel corpus. In Proceedings of LREC 2012, Istanbul, Turkey: European Language Resources Association (ELRA), pp. 23812388.
Vincze, V. and Csirik, J. (2010). Hungarian corpus of light verb constructions. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 11101118.
Vincze, V., Szauter, D., Almási, A., Móra, G., Alexin, Z. and Csirik, J. (2010). Hungarian dependency Treebank. In Proceedings of LREC 2010, Valletta, Malta: European Language Resources Association (ELRA), pp. 18551862.
Vincze, V., Nagy, T. I. and Berend, G. (2011a). Detecting noun compounds and light verb constructions: A contrastive study. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 116121.
Vincze, V., Nagy, T. I. and Berend, G. (2011b). Multiword expressions and named entities in the Wiki50 corpus. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 289295.
Vincze, V., Nagy, T. I. and Farkas, R. (2013a). Identifying English and Hungarian light verb constructions: A contrastive approach. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp. 255261.
Vincze, V., Nagy, T. I. and Zsibrita, J. (2013 b). Learning to detect English and Hungarian light verb constructions. ACM Transactions on Speech and Language Processing (TSLP) 10(2). https://protect-eu.mimecast.com/s/xJMuC5747UM2yDwTzuXzz?domain=dl.acm.org
Waszczuk, J. (2018). TRAVERSAL at PARSEME shared task 2018: Identification of verbal multiword expressions using a discriminative tree-structured model. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 275282.
Zampieri, N., Scholivet, M., Ramisch, C. and Favre, B. (2018). Veyn at PARSEME shared task 2018: Recurrent neural networks for VMWE identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 290296.
Zsibrita, J., Vincze, V. and Farkas, R. (2013). magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In Proceedings of RANLP, Hissar, Bulgaria: RANLP 2013 Organizing Committee, pp. 763771.

Keywords

Detecting light verb constructions across languages

  • István Nagy T. (a1), Anita Rácz (a2) and Veronika Vincze (a3)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.