Hostname: page-component-7479d7b7d-q6k6v Total loading time: 0 Render date: 2024-07-08T23:44:21.308Z Has data issue: false hasContentIssue false

Unsupervised learning of semantic representation for documents with the law of total probability

Published online by Cambridge University Press:  02 November 2017

YANG WEI
Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn Institute of Electronics, Chinese Academy of Sciences, Beijing, China
JINMAO WEI
Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn
ZHENGLU YANG
Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn

Abstract

The semantic information of documents needs to be represented because it is the basis for many applications, such as document summarization, web search, and text analysis. Although many studies have explored this problem by enriching document vectors with the relatedness of the words involved, the performance remains far from satisfactory because the physical boundaries of documents hinder the evaluation of the relatedness between words. To address this problem, we propose an effective approach to further infer the implicit relatedness between words via their common related words. To avoid overestimation of the implicit relatedness, we restrict the inference in terms of the marginal probabilities of the words based on the law of total probability. The proposed method measures the relatedness between words, which is confirmed theoretically and experimentally. Thorough evaluation on real datasets illustrates that significant improvement on document clustering has been achieved with the proposed method compared with state-of-the-art methods.

Type
Article
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AlAgha, I., and Nafee, R., 2015. Investigating the efficiency of WordNet as background knowledge for document clustering. Journal of Engineering Research and Technology 2 (2): 152–8.Google Scholar
Amiri, H., and III, H. D. 2016. Short text representation for detecting churn in microblogs. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA. Menlo Park, CA: AAAI Press, pp. 2566–72.Google Scholar
Andrews, N. O., and Fox, E. A. 2007. Recent developments in document clustering. Technical Report, Department of Computer Science, Virginia Tech.Google Scholar
Billhardt, H., Borrajo, D., and Maojo, V., 2002. A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53 (3): 236–49.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003): 9931022.Google Scholar
Bullinaria, J. A., and Levy, J. P., 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39 (3): 510–26.Google Scholar
Cai, D., He, X., and Han, J., 2011. Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering 23 (6): 902–13.Google Scholar
Cheng, X., Miao, D., Wang, C., and Cao, L., 2013. Coupled term-term relation analysis for document clustering. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA. Washington, DC, USA: IEEE, pp. 18.Google Scholar
Das, R., Zaheer, M., and Dyer, C., 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 795804.Google Scholar
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E., 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20 (1): 116–31.Google Scholar
Gabrilovich, E., and Markovitch, S., 2006. Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA. Menlo Park, CA: AAAI Press, pp. 1301–6.Google Scholar
Gabrilovich, E., and Markovitch, S., 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In International Joint Conference on Artifical Intelligence, Hyderabad, India. San Francisco: Margan Kaufmann, pp. 1606–11.Google Scholar
Grefenstette, E., Hermann, K. M., Dinu, G., and Blunsom, P., 2014. New directions in vector space models of meaning. Tutorials. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA. aclweb.org, pp. 88.Google Scholar
Harris, Z. S., 1954. Distributional structure. Word 10 (2–3): 146–62.Google Scholar
Hassan, S., and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, CA, USA. Menlo Park, CA: AAAI Press, pp. 884–9.Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E. K., and Zhou, X., 2009. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France. New York, NY, USA: ACM, pp. 389–96.CrossRefGoogle Scholar
Iosif, E., and Potamianos, A., 2010. Unsupervised semantic similarity computation between terms using web documents. IEEE Transactions on Knowledge and Data Engineering 22 (11): 1637–47.CrossRefGoogle Scholar
Kalogeratos, A., and Likas, A., 2012. Text document clustering using global term context vectors. Knowledge and Information Systems 31 (3): 455–74.Google Scholar
Kim, Y., 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. aclweb.org, pp. 1746–51.Google Scholar
Kusner, M. J., Sun, Y., Kolkin, N. I., and Weinberger, K. Q., 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France: JMLR.org, pp. 957–66.Google Scholar
Landauer, T. K., and Dumais, S. T., 1997. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104 (2): 211–40.Google Scholar
Landauer, T. K., Laham, D., Rehder, B., and Schreiner, M. E., 1997. How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In Proceedings of the 19th Annual Meeting of the Cognitive Science Society, Stanford University, CA, USA, Mawhwah, NJ: Erlbaum, pp. 412–7.Google Scholar
Le, Q. V., and Mikolov, T., 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, San Francisco, CA, USA: Morgan Kaufmann, pp. 1188–96.Google Scholar
Lebret, R., and Collobert, R., 2015. Rehabilitation of count-based models for word vector representations. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, Lecture Notes in Computer Science, Cham: Springer, pp. 417–29.Google Scholar
Lin, D., 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, San Francisco, CA, USA: Morgan Kaufmann, pp. 296304.Google Scholar
Lovász, L., and Plummer, MD., 1986. Matching theory. Annals of Discrete Mathematics 29 (5): 42–6.Google Scholar
Mihalcea, R., Corley, C., and Strapparava, C., 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA, Menlo Park, CA: AAAI Press, pp. 775–80.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA. USA: Curran Associates, pp. 3111–9.Google Scholar
Miller, G. A., and Charles, W. G., 1991. Contextual correlates of semantic similarity. Language Cognition and Neuroscience 6 (1): 128.Google Scholar
Mitchell, J., and Steedman, M., 2015. Orthogonality of syntax and semantics within distributional spaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 1301–10.Google Scholar
Nasir, J. A., Varlamis, I., Karim, A., and Tsatsaronis, G., 2013. Semantic smoothing for text clustering. Knowledge-Based Systems 54: 216–29.Google Scholar
Österlund, A., and Ödling, D., 2015. Factorization of latent variables in distributional semantic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 227–31.Google Scholar
Pangos, A., Iosif, E., Potamianos, A., and Fosler-Lussier, E., 2005. Combining statistical similarity measures for automatic induction of semantic classes. In Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, San Juan, Puerto Rico. Washington, DC, USA: IEEE, pp. 278–83.CrossRefGoogle Scholar
Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627–33.Google Scholar
Rubner, Y., Tomasi, C., and Guibas, L. J., 1998. A metric for distributions with applications to image databases. In Procedings of the 16th International Conference on Computer Vision, Bombay, India. Washington, DC, USA: IEEE, pp. 5966.Google Scholar
Rui, L., Liu, S., Yang, M., Li, M., Zhou, M., and Li, S., 2015. Hierarchical recurrent neural network for document modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 899907.Google Scholar
Rungsawang, A., 1998. Dsir: the first trec-7 attempt. In Proceedings of The 7th Text REtrieval Conference, Gaithersburg, MD, USA, pp. 366–72.Google Scholar
Turney, P. D., and Pantel, P., 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37 (1): 141–88.Google Scholar
Wang, T., Mohamed, A., and Hirst, G., 2015. Learning lexical embeddings with syntactic and lexicographic knowledge. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 458–63.Google Scholar
Wei, T., Lu, Y., Chang, H., Zhou, Q., and Bao, X., 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications 42 (4): 2264–75.Google Scholar
Wei, Y., and Wei, J., 2013. A semantic set theory for word semantic similarity assessment. In Proceedings of the International Conference on Mechatronic Sciences, Electric Engineering and Computer, Shenyang, China. Washington, DC, USA: IEEE, pp. 2466–71.Google Scholar
Wei, Y., Wei, J., and Xu, H., 2015. Context vector model for document representation: a computational study. In Natural Language Processing and Chinese Computing, Nanchang, China. Lecture Notes in Computer Science, Cham: Springer, pp. 194206.Google Scholar
Wei, Y., Wei, J., and Yang, Z., 2015. Enriching document representation with the deviations of word co-occurrence frequencies. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, Zhangjiajie, China, Lecture Notes in Computer Science, Cham: Springer, pp. 241–54.Google Scholar
Wei, Y., Wei, J., Yang, Z., and Liu, Y., 2016. Joint probability consistent relation analysis for document representation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Dallas, TX, USA, Lecture Notes in Computer Science, Cham: Springer, pp. 517–32.Google Scholar
Wu, Z., and Giles, C. L., 2015. Sense-aware semantic analysis: a multi-prototype word representation model using Wikipedia. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA. Menlo Park, CA: AAAI Press, pp. 2188–94.Google Scholar
Xie, P., Deng, Y., and Xing, E., 2015. Diversifying restricted boltzmann machine for document modeling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Hilton, Sydney, Australia, New York, NY, USA: ACM, pp. 1315–24.Google Scholar
Xu, W., Liu, X., and Gong, Y., 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, New York, NY, USA: ACM, pp. 267–73.Google Scholar
Yang, Y., Downey, D., and Boyd-Graber, J., 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 308–17.Google Scholar
Zimmerman, D. W., 1997. Teacher’s corner: a note on interpretation of the paired-samples t test. Journal of Educational and Behavioral Statistics 22 (3): 349–60.Google Scholar