Finding a domain-appropriate sense inventory for semantically tagging a corpus
Published online by Cambridge University Press: 01 December 1998
Abstract
Semantically tagging a corpus is useful for many intermediate NLP tasks such as: acquisition of word argument structures in sublanguages; acquisition of syntactic disambiguation cues; terminology learning; etc. The general idea is that semantic tags allow the generalization of observed word patterns, and facilitate the discovery of recurrent sublanguage phenomena and selectional rules of various types. Yet, as opposed to POS tags in morphology, there is no consensus in the literature about the type and granularity of the semantic tags to be used. In this paper, we argue that an appropriate selection of semantic tags should be domain-dependent. We propose a method by which we select from WordNet an inventory of semantic tags that are ‘optimal’ for a given corpus, according to a scoring function defined as a linear combination of general and corpus-dependent performance factors. We believe that an optimal selection of a category inventory is a necessary premise for obtaining better results in all lexically learning algorithms that are based on, or concerned with, semantic categorization of words. Furthermore, an adequate inventory (one which intuitively ‘fits’ with the semantics of a domain, e.g. phenomenon for Natural Science, or part, piece for a technical handbook) may facilitate the manual annotation of large corpora.
- Type
- Research Article
- Information
- Copyright
- © 1998 Cambridge University Press
Footnotes
- 4
- Cited by