Lexical acquisition and semantic space models: Learning the semantics of unknown words

KOSTADIN CHOLAKOV

doi:10.1017/S1351324913000053

Lexical acquisition and semantic space models: Learning the semantics of unknown words

Published online by Cambridge University Press: 05 March 2013

KOSTADIN CHOLAKOV

Show author details

KOSTADIN CHOLAKOV*: Affiliation:
University of Groningen, Oude Kijk in 't Jatstraat 26, 9712EK Groningen, The Netherlands e-mail: k.cholakov@rug.nl

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In recent studies it has been shown that syntax-based semantic space models outperform models in which the context is represented as a bag-of-words in several semantic analysis tasks. This has been generally attributed to the fact that syntax-based models employ corpora that are syntactically annotated by a parser and a computational grammar. However, if the corpora processed contain words which are unknown to the parser and the grammar, a syntax-based model may lose its advantage since the syntactic properties of such words are unavailable. On the other hand, bag-of-words models do not face this issue since they operate on raw, non-annotated corpora and are thus more robust. In this paper, we compare the performance of syntax-based and bag-of-words models when applied to the task of learning the semantics of unknown words. In our experiments, unknown words are considered the words which are not known to the Alpino parser and grammar of Dutch. In our study, the semantics of an unknown word is defined by finding its most similar word in cornetto, a Dutch lexico-semantic hierarchy. We show that for unknown words the syntax-based model performs worse than the bag-of-words approach. Furthermore, we show that if we first learn the syntactic properties of unknown words by an appropriate lexical acquisition method, then in fact the syntax-based model does outperform the bag-of-words approach. The conclusion we draw is that, for words unknown to a given grammar, a bag-of-words model is more robust than a syntax-based model. However, the combination of lexical acquisition and syntax-based semantic models is best suited for learning the semantics of unknown words.

Type: Articles
Information: Natural Language Engineering , Volume 20 , Issue 4 , October 2014 , pp. 537 - 555

DOI: https://doi.org/10.1017/S1351324913000053 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Almuhareb, A., and Poesio, M. 2004. Attribute-based and value-based clustering: an evaluation. In Proceedings of EMNLP 2004, Edinburgh, UK, pp. 158–65.Google Scholar

Baldwin, T. 2005. General-purpose lexical acquisition: Procedures, questions and results. In Proceedings of the Pacific Association for Computational Linguistics, Tokyo, Japan, pp. 23–32.Google Scholar

Barg, P., and Walther, M. 1998. Processing unknown words in HPSG. In Proceedings of the 36th Conference of the ACL, Montreal, Quebec, Canada, pp. 91–5.Google Scholar

Berry, M. W., Dumais, S. T., and O'Brien, G. W. 1994. Using linear algebra for intelligent information retrieval. SIAM Review 37: 573–95.CrossRef Google Scholar

Cholakov, K., Kordoni, V., and Zhang, Y. 2008. Towards domain-independent deep linguistic processing: ensuring portability and re-usability of lexicalised grammars. In Proceedings of COLING 2008 Workshop on Grammar Engineering Across Frameworks (GEAF08), Manchester, UK, pp. 57–64.Google Scholar

Cholakov, K. and van Noord, G. 2009. Combining finite state and corpus-based techniques for unknown word prediction. In Proceedings of the 7th Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 60–65.Google Scholar

Cholakov, K. and van Noord, G. 2010. Acquisition of unknown word paradigms for large-scale grammars. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), Beijing, China, pp. 153–61.Google Scholar

Cholakov, K., van Noord, G., Kordoni, V., and Zhang, Y. 2011. Adaptability of lexical acquisition for large-scale grammars. In Proceedings of the 8th Conference on Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, pp. 355–62.Google Scholar

Church, K. W., and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar

Copestake, A., and Flickinger, D. 2000. An open-source grammar development environment and broad-coverage English grammar using HPSG. In Proceedings of the 2nd International Conference on Language Resource and Evaluation (LREC 2000), Athens, Greece.Google Scholar

Crysmann, B. 2003. On the efficient implementation of German verb placement in HPSG. In Proceedings of RANLP 2003, Borovets, Bulgaria.Google Scholar

Curran, J. R., and Moens, M. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, PA, pp. 59–66.Google Scholar

Erbach, G. 1990. Syntactic processing of unknown words. IWBS Technical report 131, IBM, Stuttgart.Google Scholar

Erk, K. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of the 45th ACL Meeting, Prague, Czech Republic, pp. 216–23.Google Scholar

Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: The MIT Press.Google Scholar

Fouvry, F. 2003. Lexicon acquisition with a large-coverage unification-based grammar. In Companion to the 10th Conference of EACL, Budapest, Hungary, pp. 87–90.Google Scholar

Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations, vol. 3. St Baltimore, MD: Johns Hopkins Univ. Press.Google Scholar

Grefenstette, G. 1994. Explorations in Automatic Thesaurus Discovery. New York: Springer.CrossRef Google Scholar

Horák, A., Vossen, P., and Rambousek, A. 2008. The development of a complex-structured lexicon based on WordNet. In Proceedings of the 4th International Global WordNet Conference (GWC-2008), Szeged, Hungary, pp. 200–8.Google Scholar

Lin, D. 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 768–74.Google Scholar

Lin, D. 1998b. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, pp. 296–304.Google Scholar

Lowe, W. 2001. Towards a theory of semantic space. In Proceedings of the 2nd Annual Conference of the Cognitive Science Society, Edinburgh, UK, pp. 576–81.Google Scholar

Malouf, R. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, pp. 49–55.Google Scholar

McCarthy, D., Koeling, R., Weeds, J., and Carroll, J. 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, pp. 279–86.Google Scholar

Miller, G. A., and Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 1–28.Google Scholar

Ordelman, R. J. F. 2002. Twente nieuws corpus (TwNC). Technical report, Parlevink Language Technology Group, University of Twente, Enschede, Netherlands.Google Scholar

Padó, S., and Lapata, M. 2007. Dependency-based construction of semantic space models. Computational Linguistics 33 (2): 161–99.Google Scholar

Rapp, R. 2004. A freely available automatically generated thesaurus of related words. In Proceedings of the 4th Language Resources and Evaluation Conference (LREC 2004), Lisbon, Portugal, pp. 395–8.Google Scholar

Rothenhäusler, K., and Schütze, H. 2009. Unsupervised classification with dependency-based word spaces. In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, Singapore, pp. 17–24.Google Scholar

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM 18: 613–20.Google Scholar

Schütze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97–123.Google Scholar

Turney, Peter D., and Pantel, P. 2010. From frequency to meaning. Vector space models of semantics. Journal of Artificial Intelligence Research 37 (1): 141–88.Google Scholar

Van de Cruys, T. 2008. A comparison of bag of words and syntax-based approaches for word categorization. In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics. Bridging the Gap Between Semantic Theory and Computational Simulations, Hamburg, Germany, pp. 47–54.Google Scholar

Van der Plas, L., and Tiedemann, J. 2006. Finding synonyms using automatic word alignment and measures of distributional similarity. In Proceedings of the COLING-ACL Joint Conference, Sydney, Australia, pp. 866–73.Google Scholar

van Noord, G. 2006. At last parsing is now operational. In Proceedings of TALN, Leuven, Belgium, pp. 20–42.Google Scholar

Vossen, P. (ed.) 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht, Netherlands: Kluwer.CrossRef Google Scholar

Wu, Z., and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133–8.Google Scholar

Zhang, Y., and Kordoni, V. 2006. Automated deep lexical acquisition for robust open text processing. In Proceedings of the 5th International Conference on Language Recourses and Evaluation (LREC 2006), Genoa, Italy, pp. 275–80.Google Scholar

Article contents

Lexical acquisition and semantic space models: Learning the semantics of unknown words

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests