Hostname: page-component-848d4c4894-nmvwc Total loading time: 0 Render date: 2024-07-05T19:30:57.436Z Has data issue: false hasContentIssue false

A linear algebraic approach to datalog evaluation

Published online by Cambridge University Press:  22 May 2017

TAISUKE SATO*
Affiliation:
AI research center AIST/National Institute of Informatics, Tokyo, Japan (e-mails: satou.taisuke@aist.go.jp)

Abstract

We propose a fundamentally new approach to Datalog evaluation. Given a linear Datalog program DB written using N constants and binary predicates, we first translate if-and-only-if completions of clauses in DB into a set E q (DB) of matrix equations with a non-linear operation, where relations in M DB, the least Herbrand model of DB, are encoded as adjacency matrices. We then translate E q (DB) into another, but purely linear matrix equations q (DB). It is proved that the least solution of q (DB) in the sense of matrix ordering is converted to the least solution of E q (DB) and the latter gives M DB as a set of adjacency matrices. Hence, computing the least solution of q (DB) is equivalent to computing M DB specified by DB. For a class of tail recursive programs and for some other types of programs, our approach achieves O(N 3) time complexity irrespective of the number of variables in a clause since only matrix operations costing O(N 3) or less are used. We conducted two experiments that compute the least Herbrand models of linear Datalog programs. The first experiment computes transitive closure of artificial data and real network data taken from the Koblenz Network Collection. The second one compared the proposed approach with the state-of-the-art symbolic systems including two Prolog systems and two ASP systems, in terms of computation time for a transitive closure program and the same generation program. In the experiment, it is observed that our linear algebraic approach runs 101 ~ 104 times faster than the symbolic systems when data is not sparse. Our approach is inspired by the emergence of big knowledge graphs and expected to contribute to the realization of rich and scalable logical inference for knowledge graphs.

Type
Regular Papers
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alviano, M., Faber, W., Leone, N., Perri, S., Pfeifer, G. and Terracina, G. 2010. The disjunctive datalog system DLV. In Datalog Reloaded, LNCS 6702, de Moor, O., Gottlob, G., Furche, T., and Sellers, A., Eds. Springer, Berlin, 282301.Google Scholar
Bartels, R. and Stewart, G. 1972. Solution of the matrix equation AX + XB = C. Communication of the ACM 15, 9.CrossRefGoogle Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T. and Taylor, J. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proc. of the 2008 ACM SIGMOD International Conference on Management of data, ACM, New York, NY, USA, 1247–1250.Google Scholar
Ceri, S., Gottlob, G. and Tanca, L. 1989. What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering 1, 1, 146166.CrossRefGoogle Scholar
Cichocki, A., Zdunek, R., Phan, A.-H. and Amari, S. 2009. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Chichester, West Sussex, UK, John Wiley & Sons, Ltd. CrossRefGoogle Scholar
Coppersmith, D. and Winograd, S. 1990. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9, 3, 251280.CrossRefGoogle Scholar
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S. and Zhang, W. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proc. of 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD2014, ACM, New York, NY, USA, 601–610.Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B. and Schaub, T. 2014. Clingo = ASP + Control: Preliminary Report. In Leuschel, M. and Schrijvers, T., Eds. Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP'14), Vol. 14(4–5). 1–9.Google Scholar
Golub, G., Nash, S. and Van Loan, C. 1979. A Hessenberg-Schur method for the problem AX + XB = C. IEEE Transion Automated Control AC-24, Vol. 24 (6). 909913.CrossRefGoogle Scholar
Granat, R., Jonsson, I. and Kågström, B. 2009. RECSY and SCASY library software: Recursive blocked and parallel algorithms for Sylvester-type matrix equations with some applications. Parallel Scientific Computing and Optimization, Vol. 27, 324.CrossRefGoogle Scholar
Grefenstette, E. 2013. Towards a formal distributional semantics: Simulating logical calculi with tensors. In Proc. of the 2nd Joint Conference on Lexical and Computational Semantics, Association for Computational Linguistics (ACL), Stroudsburg, PA 18360 USA, 1–10.Google Scholar
Jonsson, I. and Kågström, B. 2002. Recursive blocked algorithms for solving triangular systems – Part II: Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Transactions on Mathematical Software 28, 4, 392415.CrossRefGoogle Scholar
Kolda, T. G. and Bader, B. W. 2009. Tensor decompositions and applications. SIAM Review 51, 3, 455500.CrossRefGoogle Scholar
Krompass, D., Nickel, M. and Tresp, V. 2014. Querying factorized probabilistic triple databases. In Proc. of the 13th International Semantic Web Conference(ISWC'14), Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandeac, D., Groth, P., Noy, N., Janowicz, K., Goble, C., Eds. Springer-Verlag New York, Inc., New York, NY, USA, 114–129.Google Scholar
Kunegis, J. 2013. KONECT – The Koblenz network collection. In Proc. of the International Conference on World Wide Web Companion, ACM, New York, NY, USA, 1343–1350.Google Scholar
Lin, F. 2013. From Satisfiability to Linear Algebra. Technical report, Hong Kong University of Science and Technology.Google Scholar
Lloyd, J. 1993. Foundations of Logic Programming, 2nd ed. Springer-Verlag, New York, Inc.Google Scholar
Nickel, M. 2013. Tensor factorization for relational learning. PhD Thesis, Ludwig-Maximilians-Universitat Munchen.Google Scholar
Nickel, M., Murphy, K., Tresp, V. and Gabrilovich, E. 2015. A review of relational machine learning for knowledge graphs: From multi-relational link prediction to automated knowledge graph construction. CoRR abs/1503.00759, Proceedings of the IEEE, 104(1), pp. 11–33.Google Scholar
Rocktäschel, T., Bosnjak, M., Singh, S. and Riedel, S. 2014. Low-dimensional embeddings of logic. In Proceedings of the ACL 2014 ACL Workshop on Semantic Parsing (SP'14), Association for Computational Linguistics, pp. 45–49.Google Scholar
Rocktäschel, T., Singh, S. and Riedel, S. 2015. Injecting logical background knowledge into embeddings for relation extraction. Association for Computational Linguistics Eds. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 1119–1129.Google Scholar
Saberi, A., Stoorvogel, A. and Sannuti, P. 2007. Filtering Theory. With Applications to Fault Detection, Isolation, and Estimation, Birkhauser, Boston, Mass, USA, 2007. Birkhäuser, Boston.CrossRefGoogle Scholar
Simoncini, V. 2013. Computational Methods for Linear Matrix Equations. Technical report, SIAM REVIEW, 58(3), pp. 377–441.Google Scholar
Suchanek, F. M., Kasneci, G. and Weikum, G. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proc. of the 16th International World Wide Web Conference(WWW'07), ACM, New York, NY, USA, 697–706.Google Scholar
Swift, T. and Warren, D. 2012. XSB: Extending prolog with tabled logic programming. Theory and Practice of Logic Programming (TPLP) 12, 1–2, 157187.CrossRefGoogle Scholar
Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2, 146160.CrossRefGoogle Scholar
Tekle, K. T. and Liu, Y. A. 2010. Precise complexity analysis for efficient datalog queries. In Proc. of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, ACM, New York, NY, USA, 35–44.Google Scholar
Warren, D. S. 1999. Programming in Tabled Prolog (very) DRAFT 1. Technical Report, Stony Brook University.Google Scholar
Yang, B., Yih, W., He, X., Gao, J. and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proc. of the International Conference on Learning Representations (ICLR) 2015.Google Scholar
Zhou, N.-F., Kameya, Y. and Sato, T. 2010. Mode-directed tabling for dynamic programming, machine learning, and constraint solving. In Proc. of the 22nd International Conference on Tools with Artificial Intelligence (ICTAI-2010), IEEE Computer Society, Washington DC, USA, 213–218.Google Scholar