Skip to main content Accessibility help
×
Home

A linear algebraic approach to datalog evaluation

  • TAISUKE SATO (a1)

Abstract

We propose a fundamentally new approach to Datalog evaluation. Given a linear Datalog program DB written using N constants and binary predicates, we first translate if-and-only-if completions of clauses in DB into a set E q (DB) of matrix equations with a non-linear operation, where relations in M DB, the least Herbrand model of DB, are encoded as adjacency matrices. We then translate E q (DB) into another, but purely linear matrix equations q (DB). It is proved that the least solution of q (DB) in the sense of matrix ordering is converted to the least solution of E q (DB) and the latter gives M DB as a set of adjacency matrices. Hence, computing the least solution of q (DB) is equivalent to computing M DB specified by DB. For a class of tail recursive programs and for some other types of programs, our approach achieves O(N 3) time complexity irrespective of the number of variables in a clause since only matrix operations costing O(N 3) or less are used. We conducted two experiments that compute the least Herbrand models of linear Datalog programs. The first experiment computes transitive closure of artificial data and real network data taken from the Koblenz Network Collection. The second one compared the proposed approach with the state-of-the-art symbolic systems including two Prolog systems and two ASP systems, in terms of computation time for a transitive closure program and the same generation program. In the experiment, it is observed that our linear algebraic approach runs 101 ~ 104 times faster than the symbolic systems when data is not sparse. Our approach is inspired by the emergence of big knowledge graphs and expected to contribute to the realization of rich and scalable logical inference for knowledge graphs.

Copyright

References

Hide All
Alviano, M., Faber, W., Leone, N., Perri, S., Pfeifer, G. and Terracina, G. 2010. The disjunctive datalog system DLV. In Datalog Reloaded, LNCS 6702, de Moor, O., Gottlob, G., Furche, T., and Sellers, A., Eds. Springer, Berlin, 282301.
Bartels, R. and Stewart, G. 1972. Solution of the matrix equation AX + XB = C. Communication of the ACM 15, 9.
Bollacker, K., Evans, C., Paritosh, P., Sturge, T. and Taylor, J. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proc. of the 2008 ACM SIGMOD International Conference on Management of data, ACM, New York, NY, USA, 1247–1250.
Ceri, S., Gottlob, G. and Tanca, L. 1989. What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering 1, 1, 146166.
Cichocki, A., Zdunek, R., Phan, A.-H. and Amari, S. 2009. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Chichester, West Sussex, UK, John Wiley & Sons, Ltd.
Coppersmith, D. and Winograd, S. 1990. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9, 3, 251280.
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S. and Zhang, W. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proc. of 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD2014, ACM, New York, NY, USA, 601–610.
Gebser, M., Kaminski, R., Kaufmann, B. and Schaub, T. 2014. Clingo = ASP + Control: Preliminary Report. In Leuschel, M. and Schrijvers, T., Eds. Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP'14), Vol. 14(4–5). 1–9.
Golub, G., Nash, S. and Van Loan, C. 1979. A Hessenberg-Schur method for the problem AX + XB = C. IEEE Transion Automated Control AC-24, Vol. 24 (6). 909913.
Granat, R., Jonsson, I. and Kågström, B. 2009. RECSY and SCASY library software: Recursive blocked and parallel algorithms for Sylvester-type matrix equations with some applications. Parallel Scientific Computing and Optimization, Vol. 27, 324.
Grefenstette, E. 2013. Towards a formal distributional semantics: Simulating logical calculi with tensors. In Proc. of the 2nd Joint Conference on Lexical and Computational Semantics, Association for Computational Linguistics (ACL), Stroudsburg, PA 18360 USA, 1–10.
Jonsson, I. and Kågström, B. 2002. Recursive blocked algorithms for solving triangular systems – Part II: Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Transactions on Mathematical Software 28, 4, 392415.
Kolda, T. G. and Bader, B. W. 2009. Tensor decompositions and applications. SIAM Review 51, 3, 455500.
Krompass, D., Nickel, M. and Tresp, V. 2014. Querying factorized probabilistic triple databases. In Proc. of the 13th International Semantic Web Conference(ISWC'14), Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandeac, D., Groth, P., Noy, N., Janowicz, K., Goble, C., Eds. Springer-Verlag New York, Inc., New York, NY, USA, 114–129.
Kunegis, J. 2013. KONECT – The Koblenz network collection. In Proc. of the International Conference on World Wide Web Companion, ACM, New York, NY, USA, 1343–1350.
Lin, F. 2013. From Satisfiability to Linear Algebra. Technical report, Hong Kong University of Science and Technology.
Lloyd, J. 1993. Foundations of Logic Programming, 2nd ed. Springer-Verlag, New York, Inc.
Nickel, M. 2013. Tensor factorization for relational learning. PhD Thesis, Ludwig-Maximilians-Universitat Munchen.
Nickel, M., Murphy, K., Tresp, V. and Gabrilovich, E. 2015. A review of relational machine learning for knowledge graphs: From multi-relational link prediction to automated knowledge graph construction. CoRR abs/1503.00759, Proceedings of the IEEE, 104(1), pp. 11–33.
Rocktäschel, T., Bosnjak, M., Singh, S. and Riedel, S. 2014. Low-dimensional embeddings of logic. In Proceedings of the ACL 2014 ACL Workshop on Semantic Parsing (SP'14), Association for Computational Linguistics, pp. 45–49.
Rocktäschel, T., Singh, S. and Riedel, S. 2015. Injecting logical background knowledge into embeddings for relation extraction. Association for Computational Linguistics Eds. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 1119–1129.
Saberi, A., Stoorvogel, A. and Sannuti, P. 2007. Filtering Theory. With Applications to Fault Detection, Isolation, and Estimation, Birkhauser, Boston, Mass, USA, 2007. Birkhäuser, Boston.
Simoncini, V. 2013. Computational Methods for Linear Matrix Equations. Technical report, SIAM REVIEW, 58(3), pp. 377–441.
Suchanek, F. M., Kasneci, G. and Weikum, G. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proc. of the 16th International World Wide Web Conference(WWW'07), ACM, New York, NY, USA, 697–706.
Swift, T. and Warren, D. 2012. XSB: Extending prolog with tabled logic programming. Theory and Practice of Logic Programming (TPLP) 12, 1–2, 157187.
Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2, 146160.
Tekle, K. T. and Liu, Y. A. 2010. Precise complexity analysis for efficient datalog queries. In Proc. of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, ACM, New York, NY, USA, 35–44.
Warren, D. S. 1999. Programming in Tabled Prolog (very) DRAFT 1. Technical Report, Stony Brook University.
Yang, B., Yih, W., He, X., Gao, J. and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In Proc. of the International Conference on Learning Representations (ICLR) 2015.
Zhou, N.-F., Kameya, Y. and Sato, T. 2010. Mode-directed tabling for dynamic programming, machine learning, and constraint solving. In Proc. of the 22nd International Conference on Tools with Artificial Intelligence (ICTAI-2010), IEEE Computer Society, Washington DC, USA, 213–218.

Keywords

A linear algebraic approach to datalog evaluation

  • TAISUKE SATO (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed