Products of weighted logic programs

SHAY B. COHEN; ROBERT J. SIMMONS; NOAH A. SMITH

doi:10.1017/S1471068410000529

Products of weighted logic programs

Published online by Cambridge University Press: 28 January 2011

SHAY B. COHEN ,

ROBERT J. SIMMONS and

NOAH A. SMITH

Show author details

SHAY B. COHEN: Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: scohen@cs.cmu.edu, rjsimmon@cs.cmu.edu, nasmith@cs.cmu.edu)
ROBERT J. SIMMONS: Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: scohen@cs.cmu.edu, rjsimmon@cs.cmu.edu, nasmith@cs.cmu.edu)
NOAH A. SMITH: Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: scohen@cs.cmu.edu, rjsimmon@cs.cmu.edu, nasmith@cs.cmu.edu)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Weighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the product transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts.” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying product to weighted logic programs corresponding to simpler weighted logic programs. In addition, we show how the computation of Kullback–Leibler divergence, an information-theoretic measure, can be interpreted using product.

Keywords

weighted logic programming program transformations natural language processing

Type: Regular Papers
Information: Theory and Practice of Logic Programming , Volume 11 , Special Issue 2-3: The 24th International Conference on Logic Programming (ICLP 2008) , March 2011 , pp. 263 - 296

DOI: https://doi.org/10.1017/S1471068410000529 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Charniak, E., Knight, K. and Yamada, K. 2003. Syntax-based language models for statistical machine translation. In Proc. of the MT Summit IX, Association for Machine Translation in the Americas, Washington, DC.Google Scholar

Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201–228.CrossRef Google Scholar

Cocke, J. and Schwartz, J. T. 1970. Programming Languages and their Compilers: Preliminary Notes. Courant Institute of Mathematical Sciences, New York University.Google Scholar

Cohen, S. B. and Smith, N. A. 2007. Joint morphological and syntactic disambiguation. In Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, 208–217.Google Scholar

Cortes, C., Mohri, M., Rastogi, A. and Riley, M. D. 2006. Efficient computation of the relative entropy of probabilistic automata. In Proc. of LATIN 2006: Theoretical Informatics: 7th Latin American Symposium, Correa, José R., Hevia, Alejandro and Kiwi, Marcos, Eds. Lecture Notes in Computer Science, vol. 3887. Springer, 323–336.CrossRef Google Scholar

Dempster, A., Laird, N. and Rubin, D. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–38.Google Scholar

Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.CrossRef Google Scholar

Eisner, J. 2000. Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt Eds. Advances in Probabilistic and Other Parsing Technologies, Kluwer Academic Publishers, 29–62.Google Scholar

Eisner, J. and Blatz, J. 2007. Program transformations for optimization of parsing algorithms and other weighted logic programs. In Proc. of Formal Grammar, CSLI Publications, 45–85.Google Scholar

Eisner, J., Goldlust, E. and Smith, N. A. 2005. Compiling Comp Ling: Practical weighted dynamic programming and the Dyna language. In Proc. of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 281–290.Google Scholar

Eisner, J. and Satta, G. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proc. of the 37th Annual Meeting of the Association of Computational Linguistics, Association of Computational Linguistics, 457–464.Google Scholar

Felzenszwalb, P. F. and McAllester, D. 2007. The generalized A* architecture. Journal of Artificial Intelligence Research 29, 153–190.CrossRef Google Scholar

Gaifman, H. 1965. Dependency systems and phrase-structure systems. Information and Control 8, 3, 304–337.CrossRef Google Scholar

Goodman, J. 1999. Semiring parsing. Computational Linguistics 25, 4, 573–605.Google Scholar

Hernando, D., Crespi, V. and Cybenko, G. 2005. Efficient computation of the hidden Markov model entropy for a given observation sequence. IEEE Transactions on Information Theory 51, 7, 2681–2685.CrossRef Google Scholar

Hinton, G. E. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800.CrossRef Google Scholar PubMed

Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley.Google Scholar

Huang, L. and Chiang, D. 2005. Better k-best parsing. In Proc. of the 9th International Workshop on Parsing Technology, Association for Computational Linguistics, 53–64.CrossRef Google Scholar

Hwa, R. 2004. Sample selection for statistical parsing. Computational Linguistics 30, 3, 253–276.CrossRef Google Scholar

Kasami, T. 1965. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages. Tech. Rep. AFCRL-65-758, Air Force Cambridge Research Lab.Google Scholar

Klein, D. and Manning, C. D. 2003. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, vol. 15. MIT Press, 3–10.Google Scholar

Klein, D. and Manning, C. D. 2004. Parsing and hypergraphs. New Developments in Parsing Technology, Bunt, Harry, Carroll, John and Satta, Giorgio Eds. Kluwer Academic Publishers, 351–372.CrossRef Google Scholar

Koehn, P., Och, F. J. and Marcu, D. 2003. Statistical phrase-based translation. In Proc. of the Human Language Technology Conference and Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. 48–54.Google Scholar

Koller, D. and Friedman, N. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.Google Scholar

Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 79–86.CrossRef Google Scholar

Liang, P., Klein, D. and Jordan, M. 2008. Agreement-based learning. In Advances in Neural Information Processing Systems 20, Platt, J., Koller, D., Singer, Y. and Roweis, S., Eds. MIT Press, Cambridge, MA, 913–920.Google Scholar

Lopez, A. 2009. Translation as weighted deduction. In Proc. of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 532–540.Google Scholar

Manning, C. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.Google Scholar

McAllester, D. 2002. On the complexity analysis of static analyses. Journal of the ACM 49, 4, 512–537.CrossRef Google Scholar

Mohri, M. 1997. Finite-state transducers in language and speech processing. Computational Linguistics 23, 2, 269–311.Google Scholar

O'Sullivan, J. A. 1998. Alternating minimization algorithms: From Blahut-Armijo to expectation-maximization. In Codes, Curves and Signals: Common Threads in Communications, Vardy, A., Ed. Kluwer, 173–192.CrossRef Google Scholar

Pereira, F. C. N. and Riley, M. D. 1997. Speech recognition by composition of weighted finite automata. In Finite-State Language Processing, Roche, E. and Schabes, Y., Eds. MIT Press, 431–453.CrossRef Google Scholar

Pereira, F. C. N. and Schabes, Y. 1992. Inside–outside reestimation from partially bracketed corpora. In Proc. of the 30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 128–135.CrossRef Google Scholar

Pettorossi, A. 1999. Synthesis and transformation of logic programs using unfold/fold proofs. Journal of Logic Programming 41, 2–3 (December), 197–230.CrossRef Google Scholar

Pettorossi, A. and Proietti, M. 1994. Transformation of logic programs: Foundations and techniques. Journal of Logic Programming 19, 261–320.CrossRef Google Scholar

Shannon, C. 1948. A mathematical theory of communication. Bell System Technical Journal 27, 379–423.CrossRef Google Scholar

Shieber, S. M., Schabes, Y. and Pereira, F. C. N. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming 24, 1–2, 3–36.CrossRef Google Scholar

Sikkel, K. 1997. Parsing Schemata. Springer-Verlag.CrossRef Google Scholar

Smith, A., Cohn, T. and Osborne, M. 2005. Logarithmic opinion pools for conditional random fields. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 18–25.Google Scholar

Smith, D. A. and Smith, N. A. 2004. Bilingual parsing with factored estimation: Using English to parse Korean. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 49–56.Google Scholar

Sutton, C. and McCallum, A. 2005. Piecewise training of undirected models. In Proc. of the 21st Conference on Uncertainty in Artificial Intelligence, 568–575.Google Scholar

Tarjan, R. E. 1981. A unified approach to path problems. Journal of the ACM 28, 3, 577–593.CrossRef Google Scholar

Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23, 3, 377–404.Google Scholar

Younger, D. H. 1967. Recognition and parsing of context-free languages in time n ³. Information and Control 10, 2, 189–208.CrossRef Google Scholar

Article contents

Products of weighted logic programs

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests