Hostname: page-component-8448b6f56d-m8qmq Total loading time: 0 Render date: 2024-04-18T04:05:59.726Z Has data issue: false hasContentIssue false

Swift Markov Logic for Probabilistic Reasoning on Knowledge Graphs

Published online by Cambridge University Press:  09 November 2022

LUIGI BELLOMARINI
Affiliation:
Banca d’Italia, Rome, Italy (e-mails: luigi.bellomarini@bancaditalia.it, eleonora.laurenza@bancaditalia.it)
ELEONORA LAURENZA
Affiliation:
Banca d’Italia, Rome, Italy (e-mails: luigi.bellomarini@bancaditalia.it, eleonora.laurenza@bancaditalia.it)
EMANUEL SALLINGER
Affiliation:
TU Wien, Vienna, Austria University of Oxford, United Kingdom (e-mail: sallinger@dbai.tuwien.ac.at)
EVGENY SHERKHONOV
Affiliation:
University of Oxford, United Kingdom (e-mail: evgeny.sherkhonov@cs.ox.ac.uk)

Abstract

We provide a framework for probabilistic reasoning in Vadalog-based Knowledge Graphs (KGs), satisfying the requirements of ontological reasoning: full recursion, powerful existential quantification, expression of inductive definitions. Vadalog is a Knowledge Representation and Reasoning (KRR) language based on Warded Datalog+/–, a logical core language of existential rules, with a good balance between computational complexity and expressive power. Handling uncertainty is essential for reasoning with KGs. Yet Vadalog and Warded Datalog+/– are not covered by the existing probabilistic logic programming and statistical relational learning approaches for several reasons, including insufficient support for recursion with existential quantification and the impossibility to express inductive definitions. In this work, we introduce Soft Vadalog, a probabilistic extension to Vadalog, satisfying these desiderata. A Soft Vadalog program induces what we call a Probabilistic Knowledge Graph (PKG), which consists of a probability distribution on a network of chase instances, structures obtained by grounding the rules over a database using the chase procedure. We exploit PKGs for probabilistic marginal inference. We discuss the theory and present MCMC-chase, a Monte Carlo method to use Soft Vadalog in practice. We apply our framework to solve data management and industrial problems and experimentally evaluate it in the Vadalog system.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A. and Kateri, M. 2011. Categorical data analysis. In International Encyclopedia of Statistical Science. Springer, 206208.CrossRefGoogle Scholar
Alberti, M., Bellodi, E., Cota, G., Riguzzi, F. and Zese, R. 2017. cplint on SWISH: probabilistic logical inference with a web browser. IA 11, 1, 4764.CrossRefGoogle Scholar
Angles, R. 2018. The property graph database model. In AMW. Vol. 2100.Google Scholar
Bacchus, F. 1990. Representing and Reasoning with Probabilistic Knowledge - A Logical Approach to Probabilities. MIT Press.Google Scholar
Beame, P., den Broeck, G. V., Gribkoff, E. and Suciu, D. 2014. Symmetric weighted first-order model counting. CoRR abs/1412.1505.CrossRefGoogle Scholar
Bellomarini, L., Fakhoury, D., Gottlob, G. and Sallinger, E. 2019. Knowledge graphs and enterprise AI: the promise of an enabling technology. In ICDE. IEEE, 2637.Google Scholar
Bellomarini, L., Fayzrakhmanov, R. R., Gottlob, G., Kravchenko, A., Laurenza, E., Nenov, Y., Reissfelder, S., Sallinger, E., Sherkhonov, E. and Wu, L. 2018. Data science with Vadalog: Bridging machine learning and reasoning. In MEDI. Vol. 11163. Springer, 321.Google Scholar
Bellomarini, L., Gottlob, G., Pieris, A. and Sallinger, E. 2017. Swift logic for big data and knowledge graphs. In IJCAI, 210.Google Scholar
Bellomarini, L., Laurenza, E., Sallinger, E. and Sherkhonov, E. 2020. Reasoning under uncertainty in knowledge graphs. In RuleML+RR. Vol. 12173. Springer, 131139.Google Scholar
Bellomarini, L., Sallinger, E. and Gottlob, G. 2018. The Vadalog system: Datalog-based reasoning for knowledge graphs. In VLDB.CrossRefGoogle Scholar
Berti-Équille, L., Sarma, A. D., Dong, X., Marian, A. and Srivastava, D. 2009. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. CoRR abs/0909.1776.Google Scholar
Bleiholder, J. and Naumann, F. 2008. Data fusion. ACM Computing Surveys 41, 1, 1:11:41.Google Scholar
Bollobás, B., Borgs, C., Chayes, J. and Riordan, O. 2003. Directed scale-free graphs. In SODA, 132139.Google Scholar
Borgwardt, S., Ceylan, I. I. and Lukasiewicz, T. 2017. Ontology-mediated queries for probabilistic databases. In AAAI. AAAI Press, 10631069.Google Scholar
Borgwardt, S., Ceylan, I. I. and Lukasiewicz, T. 2018. Recent advances in querying probabilistic knowledge bases. In IJCAI, 54205426.Google Scholar
Calì, A., Gottlob, G. and Pieris, A. 2012. Towards more expressive ontology languages: The query answering problem. Artificial Intelligence 193, 87128.CrossRefGoogle Scholar
Ceri, S., Gottlob, G., Tanca, L., et al. 1989. What you always wanted to know about datalog (and never dared to ask). KDE 1, 1, 146166.Google Scholar
Ceylan, I. I. and Peñaloza, R. 2015. Probabilistic query answering in the bayesian description logic BEl. In SUM. Lecture Notes in Computer Science, vol. 9310. Springer, 2135.Google Scholar
Christen, P. 2012. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.CrossRefGoogle Scholar
Culotta, A. and McCallum, A. 2005. Joint deduplication of multiple record types in relational data. In CIKM. ACM, 257258.Google Scholar
Dalvi, N. N. and Suciu, D. 2007. Management of probabilistic data: Foundations and challenges. In PODS, 112.CrossRefGoogle Scholar
Dalvi, N. N. and Suciu, D. 2012. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM 59, 6, 30:130:87.Google Scholar
d’Amato, C., Fanizzi, N. and Lukasiewicz, T. 2008. Tractable reasoning with bayesian description logics. In SUM. Lecture Notes in Computer Science, vol. 5291. Springer, 146159.Google Scholar
Dantsin, E. 1991. Probabilistic logic programs and their semantics. In RCLP. Lecture Notes in Computer Science, vol. 592. Springer, 152164.Google Scholar
Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, A. 2001. Complexity and expressive power of logic programming. ACM Computing Surveys 33, 3, 374425.CrossRefGoogle Scholar
De Raedt, L. and Kimmig, A. 2015. Probabilistic (logic) programming concepts. ML 100, 1, 547.Google Scholar
den Broeck, G. V. and Suciu, D. 2017. Query processing on probabilistic data: A survey. Found. Trends Databases 7, 3-4, 197341.CrossRefGoogle Scholar
Domingos, P. M. and Lowd, D. 2019. Unifying logical and statistical AI with markov logic. CACM 62, 7, 7483.CrossRefGoogle Scholar
Dong, X. L., Berti-Équille, L. and Srivastava, D. 2015. Data fusion: Resolving conflicts from multiple sources. CoRR abs/1503.00310.Google Scholar
Fagin, R., Kolaitis, P. G., Miller, R. J. and Popa, L. 2005. Data exchange: semantics and query answering. Theoretical Computer Science 336, 1, 89124.CrossRefGoogle Scholar
Fayzrakhmanov, R. R., Sallinger, E., Spencer, B., Furche, T. and Gottlob, G. 2018. Browserless web data extraction: Challenges and opportunities. In WWW. ACM, 10951104.Google Scholar
Fellegi, I. and Sunter, A. 1969. A theory for record linkage. Journal of American Statistical Association 64, 11831210.CrossRefGoogle Scholar
Fierens, D., den Broeck, G. V., Renkens, J., Shterionov, D. S., Gutmann, B., Thon, I., Janssens, G. and Raedt, L. D. 2015. Inference and learning in probabilistic logic programs using weighted boolean formulas. TPLP.CrossRefGoogle Scholar
Gilks, W., Richardson, S. and Spiegelhalter, D. 1995. Markov Chain Monte Carlo in Practice . Chapman & Hall/CRC Interdisciplinary Statistics. Taylor & Francis.Google Scholar
Goodman, N. D., Mansinghka, V. K., Roy, D. M., Bonawitz, K. and Tenenbaum, J. B. 2008. Church: a language for generative models. In UAI.Google Scholar
Gottlob, G., Lukasiewicz, T., Martinez, M. V. and Simari, G. I. 2013. Query answering under probabilistic uncertainty in datalog+/- ontologies. Annals of Mathematics and Artificial Intelligence 69, 1, 3772.CrossRefGoogle Scholar
Gottlob, G. and Pieris, A. 2015. Beyond SPARQL under OWL 2 QL entailment regime: Rules to the rescue. In IJCAI. 29993007.Google Scholar
Green, T. J. and Tannen, V. 2006. Models for incomplete and probabilistic information. IEEE Database Engineering Bulletin 29, 1, 1724.Google Scholar
Gribkoff, E. and Suciu, D. 2016. Slimshot: In-database probabilistic inference for knowledge bases. PVLDB 9, 7, 552563.Google Scholar
Halpern, J. Y. 1989. An analysis of first-order logics of probability. In IJCAI, 13751381.Google Scholar
Hastings, W. K. 1970. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 1, 97109.CrossRefGoogle Scholar
Hidalgo, C. A. and Barabási, A. 2008. Scale-free networks. Scholarpedia 3, 1, 1716.Google Scholar
Huang, J., Antova, L., Koch, C. and Olteanu, D. 2009. Maybms: a probabilistic database management system. In SIGMOD Conference, 10711074.Google Scholar
Jaeger, M. 2018. Probabilistic logic and relational models. In Encyclopedia of Social Network Analysis and Mining. 2nd Ed. Springer.CrossRefGoogle Scholar
Jung, J. C. and Lutz, C. 2012. Ontology-based access to probabilistic data with OWL QL. In ISWC (1). Lecture Notes in Computer Science, vol. 7649. Springer, 182197.Google Scholar
Kersting, K. and Raedt, L. D. 2008. Basic principles of learning bayesian logic programs. In Probabilistic Inductive Logic Programming.CrossRefGoogle Scholar
Koller, D. and Friedman, N. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT.Google Scholar
Krompaß, D., Nickel, M. and Tresp, V. 2014. Querying factorized probabilistic triple databases. In ISWC (2). Lecture Notes in Computer Science, vol. 8797. Springer, 114129.Google Scholar
Latour, A. L. D., Babaki, B., Dries, A., Kimmig, A., den Broeck, G. V. and Nijssen, S. 2017. Combining stochastic constraint optimization and probabilistic programming - from knowledge compilation to constraint solving. In CP. LNCS, vol. 10416. Springer, 495511.Google Scholar
Laurenza, E. 2015. Solving conflicts in database fusion with bayesian networks. In FUSION, 399406.Google Scholar
Lee, J. and Wang, Y. 2016. Weighted rules under the stable model semantics. In KR, 145154.Google Scholar
Marx, M., Krötzsch, M. and Thost, V. 2017. Logic on MARS: ontologies for generalised property graphs. In IJCAI. 11881194.Google Scholar
McCallum, A., Tejada, S. and Quass, D., Eds. 2003. Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation. ACM Press.Google Scholar
McCallum, A. and Wellner, B. 2004. Conditional models of identity uncertainty with application to noun coreference. In NIPS. 905912.Google Scholar
Michels, C., Fayzrakhmanov, R. R., Ley, M., Sallinger, E. and Schenkel, R. 2017. Oxpath-based data acquisition for dblp. In JCDL. IEEE Computer Society, 319320.Google Scholar
Milch, B., Marthi, B., Russell, S. J., Sontag, D., Ong, D. L. and Kolobov, A. 2005. BLOG: probabilistic models with unknown objects. In IJCAI.Google Scholar
Mumick, I. S., Pirahesh, H. and Ramakrishnan, R. 1990. The magic of duplicates and aggregates. In VLDB (2002-01-03), D. McLeod, R. Sacks-Davis, and H.-J. Schek, Eds. Kaufmann, Morgan, 264277.Google Scholar
Niu, F., , C., Doan, A. and Shavlik, J. W. 2011. Tuffy: Scaling up statistical inference in markov logic networks using an RDBMS. PVLDB 4, 6, 373384.Google Scholar
Nocedal, J. and Wright, S. J. 1999. Numerical Optimization. Springer.CrossRefGoogle Scholar
Olteanu, D. 2016. Factorized databases: A knowledge compilation perspective. In AAAI Workshop: Beyond NP. AAAI Workshops, vol. WS-16-05. AAAI Press.Google Scholar
Olteanu, D. and Schleich, M. 2016. Factorized databases. SIGMOD Rec. 45, 2, 516.CrossRefGoogle Scholar
Pfeffer, A. and River Analytics, C. 2009. Figaro: An object-oriented probabilistic programming language.Google Scholar
Poggi, A., Lembo, D., Calvanese, D., Giacomo, G. D., Lenzerini, M. and Rosati, R. 2008. Linking data to ontologies. J. Data Semant. 10, 133173.Google Scholar
Poole, D. 1993. Logic programming, abduction and probability - A top-down anytime algorithm for estimating prior and posterior probabilities. New Generation Computing 11, 3, 377400.CrossRefGoogle Scholar
Poole, D. 2008. The independent choice logic and beyond. In Probabilistic Inductive Logic Progr. LNCS, vol. 4911. Springer, 222243.Google Scholar
Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing 12, 4, 777788.CrossRefGoogle Scholar
Richardson, M. and Domingos, P. M. 2006. Markov logic networks. Machine Learning 62, 1–2, 107136.CrossRefGoogle Scholar
Riguzzi, F. 2007. A top down interpreter for LPAD and cp-logic. In AI*IA. Vol. 4733. Springer, 109120.Google Scholar
Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In ICLP, 715729.Google Scholar
Sato, T. and Kameya, Y. 1997. PRISM: A language for symbolic-statistical modeling. In IJCAI, 13301339.Google Scholar
Singla, P. and Domingos, P. M. 2005. Object identification with attribute-mediated dependences. In PKDD. Lecture Notes in Computer Science, vol. 3721. Springer, 297308.Google Scholar
Singla, P. and Domingos, P. M. 2006. Entity resolution with markov logic. In ICDM. IEEE Computer Society, 572582.Google Scholar
Stuart, A. and Ord, K. 1991. Kendall’s advanced theory of statistics, Fifth ed. Vol. 2, Classical Inference and Relationship.Google Scholar
Suciu, D., Olteanu, D., , C. and Koch, C. 2011. Probabilistic Databases . Synthesis Lectures on Data Management. Morgan & Claypool Publishers.Google Scholar
Tierney, L. 1994. Markov chains for exploring posterior distributions. Annals of Statistics 22, 17011728.Google Scholar
Ullman, J. D. 1997. Information integration using logical views. In ICDT, 1940.Google Scholar
Vennekens, J., Denecker, M. and Bruynooghe, M. 2009. Cp-logic: A language of causal probabilistic events and its relation to logic programming. Theory and Practice of Logic Programming 9, 3, 245308.CrossRefGoogle Scholar
Vennekens, J., Verbaeten, S. and Bruynooghe, M. 2004. Logic programs with annotated disjunctions. In ICLP.CrossRefGoogle Scholar
Yin, X., Han, J. and Yu, P. S. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering. 20, 6, 796808.Google Scholar