Skip to main content Accessibility help

Viterbi training in PRISM



VT (Viterbi training), or hard expectation maximization (EM), is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation y, it searches for a state of hidden variables x that maximizes p(x,y | θ) by coordinate ascent on parameters θ and x. In this paper we introduce VT to PRogramming In Statistical Modeling (PRISM), a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First, VT in PRISM converges faster than EM in PRISM due to VT's termination condition. Second, parameters learned by VT often show good prediction performance compared with those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e. VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also, we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third, since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least, we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore, since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers.



Hide All
Bache, K. and Lichman, M. 2013. UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science.
Bellodi, E. and Riguzzi, F. 2012. Expectation maximization over binary decision diagrams for probabilistic logic programs. Intelligent Data Analysis 16, 6.
Brown, P., Pietra, V., Pietra, S. and Mercer, R. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263311.
Castillo, G. and Gama, J. 2005. Bias management of Bayesian network classifiers. In Discovery Science – DS 2005, 8th International Conference, Singapore, Lecture Notes in Artificial Intelligence, Vol. 3735. Springer-Verlag, New York, NY, 7083.
Cohen, S. and Smith, N. 2010. Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL'10). 1502–1511.
De Raedt, L. and Kersting, K. 2008. Probabilistic inductive logic programming. In Probabilistic Inductive Logic Programming – Theory and Applications, Raedt, L. De, Frasconi, P., Kersting, K., and Muggleton, S., Eds. Lecture Notes in Computer Science, Vol. 4911. Springer, New York, NY, 127.
De Raedt, L., Kimmig, A. and Toivonen, H. 2007. ProbLog: A probabilistic Prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). MIT Press, Cambridge, MA, 24682473.
Friedman, N., Geiger, D. and Goldszmidt, M. 1997. Bayesian network classifiers. Machine Learning 29, 2, 131163.
Getoor, L. and Taskar, B., Eds. 2007. Introduction to Statistical Relational Learning. MIT Press, Cambridge, MA.
Goodman, J. 1996. Parsing algorithms and metrics. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL'96). ACL, New York, NY, 177183.
Gutmann, B., Kimmig, A., Kersting, K. and De Raedt, L. 2008. Parameter learning in probabilistic databases: A least squares approach. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2008), Part I. Springer, New York, NY, 473488.
Gutmann, B., Thon, I. and De Raedt, L. 2011. Learning the parameters of probabilistic logic programs from interpretations. In Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2011), Part I, LNCS, Vol. 6911. Springer, New York, NY, 581596.
Huynh, T. and Mooney, R. 2010. Online max-margin weight learning with Markov logic networks. In Proceedings of the AAAI-10 Workshop on Statistical Relational AI (Star-AI 10). 32–37.
Japkowicz, N. and Shah, M., Eds. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge, UK.
Jiang, L., Zhang, H. and Cai, Z. 2009. A novel Bayes model: Hidden naive Bayes. IEEE Transactions on Knowledge and Data Engineering 21, 10, 13611371.
Joshi, D., Li, J. and Wang, J. 2006. A computationally efficient approach to the estimation of two- and three-dimensional hidden Markov models. IEEE Transactions on Image Processing 15, 7, 18711886.
Juang, B. and Rabiner, L. 1990. The segmental K-means algorithm for estimating parameters of hidden Markov models. IEEE Transactions on Signal Processing 38, 16391641.
Kimmig, A., Costa, V., Rocha, R., Demoen, B. and De Raedt, L. 2008. On the efficient execution of ProbLog programs. In Proceedings of the 24th International Conference on Logic Programming (ICLP'08). 175–189.
Lember, J. and Koloydenko, A. 2007. Adjusted viterbi training. Probability in the Engineering and Informational Sciences 21, 3, 451475.
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. and Borodovsky, M. 2005. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research 33, 64946506.
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. 281–297.
Manning, C. 1997. Probabilistic parsing using left corner language models. In Proceedings of the 5th International Conference on Parsing Technologies (IWPT-97). MIT Press, Cambridge, MA, 147158.
Riguzzi, F. and Swift, T. 2011. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming (TPLP) 11, 4–5, 433449.
Roark, B. and Johnson, M. 1999. Efficient probabilistic top-down and left-corner parsing. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 421–428.
Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming (ICLP'95). Cambridge University Press, Cambridge, UK, 715729.
Sato, T. 2007. Inside-outside probability computation for belief propagation. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI '07). 2605–2610.
Sato, T. 2011. A general MCMC method for Bayesian inference in logic-based probabilistic modeling. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI '11). 1472–1477.
Sato, T. and Kameya, Y. 2001. Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research 15, 391454.
Sato, T. and Kameya, Y. 2008. New advances in logic-based probabilistic modeling by PRISM. In Probabilistic Inductive Logic Programming, De Raedt, L., Frasconi, P., Kersting, K. and Muggleton, S., Eds. LNAI, Vol. 4911. Springer, New York, NY, 118155.
Sato, T., Kameya, Y. and Kurihara, K. 2009. Variational Bayes via propositionalized probability computation in PRISM. Annals of Mathematics and Artificial Intelligence 54, 135158.
Singla, P. and Domingos, P. 2005. Discriminative training of Markov logic networks. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), Veloso, M. M. and Kambhampati, S., Eds. Kluwer, the Netherlands, 868873.
Spitkovsky, V., Alshawi, H., Jurafsky, D. and Manning, C. 2010. Viterbi training improves unsupervised dependency parsing. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. 9–17.
Strom, N., Hetherington, L., Hazen, T., Sandness, E. and Glass, J. 1999. Acoustic modeling improvements in a segment-based speech recognizer. In Proceedings of IEEE ASRU Workshop (ASRU'99). IEEE Signal Processing Society, 139142.
Su, J. and Zhang, H. 2006. Full Bayesian network classifiers. In Proceedings of the 23rd International Conference on Machine Learning (ICML'06). 897–904.
Uratani, N., Takezawa, T., Matsuo, H. and Morita, C. 1994. ATR Integrated Speech and Language Database. Technical Report TR-IT-0056, ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan. (in Japanese).
Van Uytsel, D., Van Compernolle, D. and Wambacq, P. 2001. Maximum-likelihood training of the PLCG-based language model. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU'01). IEEE Signal Processing Society, 210213.
Webb, G., Boughton, J. and Wang, Z. 2005. Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58, 1, 524.
Zhou, N.-F., Kameya, Y. and Sato, T. 2010. Mode-directed tabling for dynamic programming, machine learning, and constraint solving. In Proceedings of the 22th International Conference on Tools with Artificial Intelligence (ICTAI-2010). IEEE Computer Society, 213218.
Zhou, N.-F., Sato, T. and Shen, Y.-D. 2008. Linear tabling strategies and optimization. Theory and Practice of Logic Programming (TPLP) 8, 1, 81109.


Viterbi training in PRISM



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed