Unsupervised learning of semantic representation for documents with the law of total probability

YANG WEI; JINMAO WEI; ZHENGLU YANG

doi:10.1017/S1351324917000420

Unsupervised learning of semantic representation for documents with the law of total probability

Published online by Cambridge University Press: 02 November 2017

YANG WEI ,

JINMAO WEI and

ZHENGLU YANG

Show author details

YANG WEI: Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn Institute of Electronics, Chinese Academy of Sciences, Beijing, China
JINMAO WEI: Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn
ZHENGLU YANG: Affiliation:
College of Computer and Control Engineering, Nankai University, Tianjin, China e-mails: wueerfu@hotmail.com, weijm@nankai.edu.cn, yangzl@nankai.edu.cn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The semantic information of documents needs to be represented because it is the basis for many applications, such as document summarization, web search, and text analysis. Although many studies have explored this problem by enriching document vectors with the relatedness of the words involved, the performance remains far from satisfactory because the physical boundaries of documents hinder the evaluation of the relatedness between words. To address this problem, we propose an effective approach to further infer the implicit relatedness between words via their common related words. To avoid overestimation of the implicit relatedness, we restrict the inference in terms of the marginal probabilities of the words based on the law of total probability. The proposed method measures the relatedness between words, which is confirmed theoretically and experimentally. Thorough evaluation on real datasets illustrates that significant improvement on document clustering has been achieved with the proposed method compared with state-of-the-art methods.

Type: Article
Information: Natural Language Engineering , Volume 24 , Issue 4 , July 2018 , pp. 491 - 522

DOI: https://doi.org/10.1017/S1351324917000420 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AlAgha, I., and Nafee, R., 2015. Investigating the efficiency of WordNet as background knowledge for document clustering. Journal of Engineering Research and Technology 2 (2): 152–8.Google Scholar

Amiri, H., and III, H. D. 2016. Short text representation for detecting churn in microblogs. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA. Menlo Park, CA: AAAI Press, pp. 2566–72.Google Scholar

Andrews, N. O., and Fox, E. A. 2007. Recent developments in document clustering. Technical Report, Department of Computer Science, Virginia Tech.Google Scholar

Billhardt, H., Borrajo, D., and Maojo, V., 2002. A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53 (3): 236–49.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003): 993–1022.Google Scholar

Bullinaria, J. A., and Levy, J. P., 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39 (3): 510–26.Google Scholar

Cai, D., He, X., and Han, J., 2011. Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering 23 (6): 902–13.Google Scholar

Cheng, X., Miao, D., Wang, C., and Cao, L., 2013. Coupled term-term relation analysis for document clustering. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA. Washington, DC, USA: IEEE, pp. 1–8.Google Scholar

Das, R., Zaheer, M., and Dyer, C., 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 795–804.Google Scholar

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391–407.3.0.CO;2-9>CrossRef Google Scholar

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E., 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20 (1): 116–31.Google Scholar

Gabrilovich, E., and Markovitch, S., 2006. Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA. Menlo Park, CA: AAAI Press, pp. 1301–6.Google Scholar

Gabrilovich, E., and Markovitch, S., 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In International Joint Conference on Artifical Intelligence, Hyderabad, India. San Francisco: Margan Kaufmann, pp. 1606–11.Google Scholar

Grefenstette, E., Hermann, K. M., Dinu, G., and Blunsom, P., 2014. New directions in vector space models of meaning. Tutorials. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA. aclweb.org, pp. 8–8.Google Scholar

Harris, Z. S., 1954. Distributional structure. Word 10 (2–3): 146–62.Google Scholar

Hassan, S., and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, CA, USA. Menlo Park, CA: AAAI Press, pp. 884–9.Google Scholar

Hu, X., Zhang, X., Lu, C., Park, E. K., and Zhou, X., 2009. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France. New York, NY, USA: ACM, pp. 389–96.CrossRef Google Scholar

Iosif, E., and Potamianos, A., 2010. Unsupervised semantic similarity computation between terms using web documents. IEEE Transactions on Knowledge and Data Engineering 22 (11): 1637–47.CrossRef Google Scholar

Kalogeratos, A., and Likas, A., 2012. Text document clustering using global term context vectors. Knowledge and Information Systems 31 (3): 455–74.Google Scholar

Kim, Y., 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. aclweb.org, pp. 1746–51.Google Scholar

Kusner, M. J., Sun, Y., Kolkin, N. I., and Weinberger, K. Q., 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France: JMLR.org, pp. 957–66.Google Scholar

Landauer, T. K., and Dumais, S. T., 1997. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104 (2): 211–40.Google Scholar

Landauer, T. K., Laham, D., Rehder, B., and Schreiner, M. E., 1997. How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In Proceedings of the 19th Annual Meeting of the Cognitive Science Society, Stanford University, CA, USA, Mawhwah, NJ: Erlbaum, pp. 412–7.Google Scholar

Le, Q. V., and Mikolov, T., 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, San Francisco, CA, USA: Morgan Kaufmann, pp. 1188–96.Google Scholar

Lebret, R., and Collobert, R., 2015. Rehabilitation of count-based models for word vector representations. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, Lecture Notes in Computer Science, Cham: Springer, pp. 417–29.Google Scholar

Lin, D., 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, San Francisco, CA, USA: Morgan Kaufmann, pp. 296–304.Google Scholar

Lovász, L., and Plummer, MD., 1986. Matching theory. Annals of Discrete Mathematics 29 (5): 42–6.Google Scholar

Mihalcea, R., Corley, C., and Strapparava, C., 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, USA, Menlo Park, CA: AAAI Press, pp. 775–80.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA. USA: Curran Associates, pp. 3111–9.Google Scholar

Miller, G. A., and Charles, W. G., 1991. Contextual correlates of semantic similarity. Language Cognition and Neuroscience 6 (1): 1–28.Google Scholar

Mitchell, J., and Steedman, M., 2015. Orthogonality of syntax and semantics within distributional spaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 1301–10.Google Scholar

Nasir, J. A., Varlamis, I., Karim, A., and Tsatsaronis, G., 2013. Semantic smoothing for text clustering. Knowledge-Based Systems 54: 216–29.Google Scholar

Österlund, A., and Ödling, D., 2015. Factorization of latent variables in distributional semantic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 227–31.Google Scholar

Pangos, A., Iosif, E., Potamianos, A., and Fosler-Lussier, E., 2005. Combining statistical similarity measures for automatic induction of semantic classes. In Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, San Juan, Puerto Rico. Washington, DC, USA: IEEE, pp. 278–83.CrossRef Google Scholar

Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627–33.Google Scholar

Rubner, Y., Tomasi, C., and Guibas, L. J., 1998. A metric for distributions with applications to image databases. In Procedings of the 16th International Conference on Computer Vision, Bombay, India. Washington, DC, USA: IEEE, pp. 59–66.Google Scholar

Rui, L., Liu, S., Yang, M., Li, M., Zhou, M., and Li, S., 2015. Hierarchical recurrent neural network for document modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 899–907.Google Scholar

Rungsawang, A., 1998. Dsir: the first trec-7 attempt. In Proceedings of The 7th Text REtrieval Conference, Gaithersburg, MD, USA, pp. 366–72.Google Scholar

Turney, P. D., and Pantel, P., 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37 (1): 141–88.Google Scholar

Wang, T., Mohamed, A., and Hirst, G., 2015. Learning lexical embeddings with syntactic and lexicographic knowledge. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China. aclweb.org, pp. 458–63.Google Scholar

Wei, T., Lu, Y., Chang, H., Zhou, Q., and Bao, X., 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications 42 (4): 2264–75.Google Scholar

Wei, Y., and Wei, J., 2013. A semantic set theory for word semantic similarity assessment. In Proceedings of the International Conference on Mechatronic Sciences, Electric Engineering and Computer, Shenyang, China. Washington, DC, USA: IEEE, pp. 2466–71.Google Scholar

Wei, Y., Wei, J., and Xu, H., 2015. Context vector model for document representation: a computational study. In Natural Language Processing and Chinese Computing, Nanchang, China. Lecture Notes in Computer Science, Cham: Springer, pp. 194–206.Google Scholar

Wei, Y., Wei, J., and Yang, Z., 2015. Enriching document representation with the deviations of word co-occurrence frequencies. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, Zhangjiajie, China, Lecture Notes in Computer Science, Cham: Springer, pp. 241–54.Google Scholar

Wei, Y., Wei, J., Yang, Z., and Liu, Y., 2016. Joint probability consistent relation analysis for document representation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Dallas, TX, USA, Lecture Notes in Computer Science, Cham: Springer, pp. 517–32.Google Scholar

Wu, Z., and Giles, C. L., 2015. Sense-aware semantic analysis: a multi-prototype word representation model using Wikipedia. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA. Menlo Park, CA: AAAI Press, pp. 2188–94.Google Scholar

Xie, P., Deng, Y., and Xing, E., 2015. Diversifying restricted boltzmann machine for document modeling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Hilton, Sydney, Australia, New York, NY, USA: ACM, pp. 1315–24.Google Scholar

Xu, W., Liu, X., and Gong, Y., 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, Canada, New York, NY, USA: ACM, pp. 267–73.Google Scholar

Yang, Y., Downey, D., and Boyd-Graber, J., 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. aclweb.org, pp. 308–17.Google Scholar

Zimmerman, D. W., 1997. Teacher’s corner: a note on interpretation of the paired-samples t test. Journal of Educational and Behavioral Statistics 22 (3): 349–60.Google Scholar

Article contents

Unsupervised learning of semantic representation for documents with the law of total probability

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests