Inductive probabilistic taxonomy learning using singular value decomposition

FRANCESCA FALLUCCHI; FABIO MASSIMO ZANZOTTO

doi:10.1017/S1351324910000197

Inductive probabilistic taxonomy learning using singular value decomposition

Published online by Cambridge University Press: 05 January 2011

FRANCESCA FALLUCCHI and

FABIO MASSIMO ZANZOTTO

Show author details

FRANCESCA FALLUCCHI: Affiliation:
Department of Computer Science, Systems and Production, University of Rome “Tor Vergata”, Italy emails: fallucchi@info.uniroma2.it, zanzotto@info.uniroma2.it
FABIO MASSIMO ZANZOTTO: Affiliation:
Department of Computer Science, Systems and Production, University of Rome “Tor Vergata”, Italy emails: fallucchi@info.uniroma2.it, zanzotto@info.uniroma2.it

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Capturing word meaning is one of the challenges of natural language processing (NLP). Formal models of meaning, such as networks of words or concepts, are knowledge repositories used in a variety of applications. To be effectively used, these networks have to be large or, at least, adapted to specific domains. Learning word meaning from texts is then an active area of research. Lexico-syntactic pattern methods are one of the possible solutions. Yet, these models do not use structural properties of target semantic relations, e.g. transitivity, during learning. In this paper, we propose a novel lexico-syntactic pattern probabilistic method for learning taxonomies that explicitly models transitivity and naturally exploits vector space model techniques for reducing space dimensions. We define two probabilistic models: the direct probabilistic model and the induced probabilistic model. The first is directly estimated on observations over text collections. The second uses transitivity on the direct probabilistic model to induce probabilities of derived events. Within our probabilistic model, we also propose a novel way of using singular value decomposition as unsupervised method for feature selection in estimating direct probabilities. We empirically show that the induced probabilistic taxonomy learning model outperforms state-of-the-art probabilistic models and our unsupervised feature selection method improves performance.

Type: Papers
Information: Natural Language Engineering , Volume 17 , Issue 1 , January 2011 , pp. 71 - 94

DOI: https://doi.org/10.1017/S1351324910000197 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., and Rigau, G. 1996. Word sense disambiguation using conceptual density. In Proceedings of the 16th Conference on Computational linguistics, Morristown, NJ, USA, pp. 16–22. Stroudsburg PA: Association for Computational Linguistics.CrossRef Google Scholar

Baroni, M., Bernardini, S., Ferraresi, A., and Zanchetta, E. 2009. The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43 (Part 3): 209–226.Google Scholar

Caron, D., Hospital, W., and Corey, P. N. 1988. Variance estimation of linear regression coefficients in complex sampling situation. In Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 688–694.Google Scholar

Chklovski, T., and Pantel, P. 2004. VerbOCEAN: mining the web for fine-grained semantic verb relations. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcellona, Spain.Google Scholar

Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence research 24: 305–339.CrossRef Google Scholar

Clark, P., Fellbaum, C., and Hobbs, J. 2008. Using and extending wordnet to support question-answering. In Proceedings of Fourth Global WordNet Conference (GWC'08), January 2008, Szeged, Hungary.Google Scholar

Corley, C., and Mihalcea, R. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, Michigan, June 2005, pp. 13–18. Stroudsburg PA: Association for Computational Linguistics.CrossRef Google Scholar

Cortes, C., and Vapnik, V. 1995. Support vector networks. Machine Learning 20: 1–25.Google Scholar

Cox, D. R. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological) 20 (2): 215–242.Google Scholar

Dhillon, I. S., Mallela, S., Guyon, I., and Elisseeff, A. 2003. A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3: 2003.Google Scholar

Geffet, M., and Dagan, I. 2005. The distributional inclusion hypotheses and lexical entailment. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 107–114. Stroudsburg PA: Association for Computational Linguistics.Google Scholar

Golub, G., and Kahan, W. 1965. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (2): 205–224.Google Scholar

Guyon, I., and Elisseeff, A. 2003, March. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–1182.Google Scholar

Harris, Z. 1964. Distributional structure. In Katz, J. J. and Fodor, J. A. (eds.), The Philosophy of Linguistics. New York: Oxford University Press.Google Scholar

Hearst, M. A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 15th International Conference on Computational Linguistics (CoLing-92), Nantes, France.Google Scholar

Kahn, J., Linial, N., and Samorodnitsky, A. 1993. Inclusion–exclusion: exact and approximate. Combinatorica 16: 465–477.Google Scholar

Lapata, M., and Keller, F. 2004. The web as a baseline: evaluating the performance of unsupervised web-based models for a range of nlp tasks. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, MA.Google Scholar

Lin, D., and Pantel, P. 2001. DIRT-discovery of inference rules from text. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco, CA.Google Scholar

Liu, B. 2007. Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. New York: Springer: Data-Centric Systems and Applications.Google Scholar

Maedche, A., and Staab, S. 2002. Measuring similarity between ontologies. In EKAW '02: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, pp. 251–263. London, UK: Springer-Verlag.Google Scholar

McCarthy, D., Koeling, R., Weeds, J., and Carroll, J. 2004. Finding predominant word senses in untagged text. In ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 279. Stroudsburg PA: Association for Computational Linguistics.Google Scholar

Medche, A. 2002. Ontology Learning for the Semantic Web of Engineering and Computer Science, vol. 665. London: Kluwer International.CrossRef Google Scholar

Miller, G. A. 1995, November. WordNet: a lexical database for English. Communications of the ACM 38 (11): 39–41.Google Scholar

Morin, E. 1999. Extraction de liens sémantiques entre termes à partir de corpus de textes techniques. Ph.D. thesis, Faculté des Sciences et de Techniques, Univesité de Nantes, Nantes, France.Google Scholar

Navigli, R., and Velardi, P. 2004. Learning domain ontologies from document warehouses and dedicated web sites. Computer Linguistics 30 (2): 151–179.CrossRef Google Scholar

Nelder, J. A., and Wedderburn, R. W. M. 1972. Generalized linear models. Journal of the Royal Statistical Society. Series A (General) 135 (3): 370–384.Google Scholar

Padó, S. 2006. User's guide to sigf: significance testing by approximate randomisation. http://www.nlpado.de/~sebastian/sigf.html.Google Scholar

Pantel, P., and Pennacchiotti, M. 2006. Espresso: leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, July 2006, pp. 113–120. Stroudsburg PA: Association for Computational Linguistics.Google Scholar

Pekar, V., and Staab, S. 2002. Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. Proceedings of the Nineteenth Conference on Computational Linguistics 2: 786–792.Google Scholar

Penrose, R. 1955. A generalized inverse for matrices. In Mathematical Proceedings of the Cambridge Philosophical Society (1955), 51: 406–413.Google Scholar

Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th ACL Meeting, Philadelphia, Pennsilvania.Google Scholar

Resnik, P. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania, PA.Google Scholar

Robison, H. R. 1970. Computer-detectable semantic structures. Information Storage and Retrieval 6 (3): 273–288.Google Scholar

Snow, R., Jurafsky, D., and Ng, A. Y. 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, July 2006, pp. 801–808.Google Scholar

Szpektor, I., Tanev, H., Dagan, I., and Coppola, B. 2004. Scaling web-based acquisition of entailment relations. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcellona, Spain.Google Scholar

Toumouth, A., Lehireche, A., Widdows, D., and Malki, M. 2006. Adapting wordnet to the medical domain using lexicosyntactic patterns in the ohsumed corpus. In AICCSA '06: Proceedings of the IEEE International Conference on Computer Systems and Applications, Washington, DC, USA, pp. 1029–1036. Washington, DC: IEEE Computer Society.Google Scholar

Yeh, A. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th Conference on Computational Linguistics, Morristown, NJ, USA, pp. 947–953. Stroudsburg PA: Association for Computational Linguistics.Google Scholar

Yoshida, K., Tsuruoka, Y., Miyao, Y., and Tsujii, J. 2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Veloso, M. M. (ed.), IJCAI, pp. 1783–1788.Google Scholar

Zanzotto, F. M., Pennacchiotti, M., and Moschitti, A. 2009. A machine learning approach to textual entailment recognition. Journal of Natural Language Engineering 15–04: 551–582.Google Scholar

Zanzotto, F. M., Pennacchiotti, M., and Pazienza, M. T. 2006. Discovering asymmetric entailment relations between verbs using selectional preferences. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, July 2006, pp. 849–856. Stroudsburg PA: Association for Computational Linguistics.Google Scholar

Article contents

Inductive probabilistic taxonomy learning using singular value decomposition

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests