Distributed Gibbs Sampling for Latent Variable Models

doi:10.1017/CBO9781139042918.012

11 - Distributed Gibbs Sampling for Latent Variable Models

from Part Two - Supervised and Unsupervised Learning Algorithms

Published online by Cambridge University Press: 05 February 2012

Ian Porteous and

Edited by

Mikhail Bilenko and

Arthur Asuncion: Affiliation:
University of California
Padhraic Smyth: Affiliation:
University of California
Max Welling: Affiliation:
University of California
David Newman: Affiliation:
University of California
Ian Porteous: Affiliation:
Google Inc., Kirkland, WA, USA
Scott Triglia: Affiliation:
University of California
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

In this chapter, we address distributed learning algorithms for statistical latent variable models, with a focus on topic models. Many high-dimensional datasets, such as text corpora and image databases, are too large to allow one to learn topic models on a single computer. Moreover, a growing number of applications require that inference be fast or in real time, motivating the exploration of parallel and distributed learning algorithms.

We begin by reviewing topic models such as Latent Dirichlet Allocation and Hierarchical Dirichlet Processes. We discuss parallel and distributed algorithms for learning these models and show that these algorithms can achieve substantial speedups without sacrificing model quality. Next we discuss practical guidelines for running our algorithms within various parallel computing frameworks and highlight complementary speedup techniques. Finally, we generalize our distributed approach to handle Bayesian networks.

Several of the results in this chapter have appeared in previous papers in the specific context of topic modeling. The goal of this chapter is to present a comprehensive overview of distributed inference algorithms and to extend the general ideas to a broader class of Bayesian networks.

Latent Variable Models

Latent variable models are a class of statistical models that explain observed data with latent (or hidden) variables. Topic models and hidden Markov models are two examples of such models, where the latent variables are the topic assignment variables and the hidden states, respectively. Given observed data, the goal is to perform Bayesian inference over the latent variables and use the learned model to make inferences or predictions.

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 217 - 239

DOI: https://doi.org/10.1017/CBO9781139042918.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Asuncion, A., Smyth, P., and Welling, M. 2009a. Asynchronous Distributed Learning of Topic Models. Pages 81–88 of: Advances in Neural Information Processing Systems 21.Google Scholar

Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W. 2009b. On Smoothing and Inference for Topic Models. Pages 27–34 of: Proceedings of the Twenty-Fifth Annual Conference on Uncertainty in Artificial Intelligence (UAI-09). Corvallis, OR: AUAI Press.Google Scholar

Asuncion, A., Smyth, P., and Welling, M. 2011. Asynchronous Distributed Estimation of Topic Models for Document Analysis. Statistical Methodology, 8(1), 3–17.CrossRef Google Scholar

Bidyuk, B., and Dechter, R. 2003. Cycle-Cutset Sampling for Bayesian Networks. Pages 297–312 of: Advances in Artificial Intelligence, 16th Conference of the Canadian Society for Computational Studies of Intelligence, Vol. 2671.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar

Boyd, S. P., Ghosh, A., Prabhakar, B., and Shah, D. 2005. Gossip Algorithms: Design, Analysis and Applications. Pages 1653–1664 of: Proceedings of INFOCOM: 24th Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 3. IEEE.Google Scholar

Buntine, W., and Jakulin, A. 2006. Discrete Component Analysis. Lecture Notes in Computer Science, 3940, 1–33.CrossRef Google Scholar

Chien, J. T., and Wu, M. S. 2008. Adaptive Bayesian Latent Semantic Analysis. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 198–207.CrossRef Google Scholar

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391–407.3.0.CO;2-9>CrossRef Google Scholar

Doshi-Velez, F., Knowles, D., Mohamed, S., and Ghahramani, Z. 2009. Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process. Pages 1294–1302 of: Advances in Neural Information Processing Systems 22.Google Scholar

Frank, A., and Asuncion, A. 2007. UCI Machine Learning Repository. www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar

Griffiths, T. L., and Steyvers, M. 2004. Finding Scientific Topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1), 5228–5235.CrossRef Google Scholar PubMed

Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. 2005. Integrating Topics and Syntax. Pages 537–544 of: Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press.Google Scholar

Gruber, A., Rosen-Zvi, M., and Weiss, Y. 2007. Hidden Topic Markov Models. Pages 163–170 of: AISTATS'07: Proceedings of 11th International Conference on Artificial Intelligence and Statistics.Google Scholar

Hofmann, T. 2001. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42(1), 177–196.CrossRef Google Scholar

Ihler, A., and Newman, D. 2009. Bounding Sample Errors in Approximate Distributed Latent Dirichlet Allocation. Large Scale Machine Learning Workshop, NIPS. UCI ICS Technical Report 09-06, www.ics.uci.edu/~ihler/papers/tr09-06.pdf.Google Scholar

Jelinek, F., Mercer, R. L., Bahl, L. R., and Baker, J. K. 1977. Perplexity – a Measure of the Difficulty of Speech Recognition Tasks. Journal of the Acoustical Society of America, 62, S63.CrossRef Google Scholar

Jolliffe, I. T. 2002. Principal Component Analysis, 2nd ed. New York: Springer.Google Scholar

Li, F. F., and Perona, P. 2005. A Bayesian Hierarchical Model for Learning Natural Scene Categories. Pages 524–531 of: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 2. IEEE Computer Society.Google Scholar

Lowe, D. G. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Google Scholar

Masada, T., Hamada, T., Shibata, Y., and Oguri, K. 2009. Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with NVIDIA CUDA Compatible Devices. Pages 491–500 of: Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence. New York: Springer.Google Scholar

Minka, T., and Lafferty, J. 2002. Expectation-Propagation for the Generative Aspect Model. Pages 352–359 of: Proceedings of the Eighteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-02). San Francisco, CA: Morgan Kaufmann.Google Scholar

Nallapati, R., Cohen, W., and Lafferty, J. 2007. Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability. Pages 349–354 of: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops. Washington, DC: IEEE Computer Society.

Newman, D., Asuncion, A., Smyth, P., and Welling, M. 2008. Distributed Inference for Latent Dirichlet Allocation. Pages 1081–1088 of: Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press.Google Scholar

Newman, D., Asuncion, A., Smyth, P., and Welling, M. 2009. Distributed Algorithms for Topic Models. Journal of Machine Learning Research, 10, 1801–1828.Google Scholar

Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.Google Scholar

Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., and Welling, M. 2008a. Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation. Pages 569–577 of: KDD'08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM.CrossRef Google Scholar

Porteous, I., Bart, E., and Welling, M. 2008b. Multi-HDP: A Non Parametric Bayesian Model for Tensor Factorization. Pages 1487–1490 of: AAAI'08: Proceedings of the 23rd National Conference on Artificial Intelligence. AAAI Press.

Pritchard, J. K., Stephens, M., and Donnelly, P. 2000. Inference of Population Structure using Multilocus Genotype Data. Genetics, 155, 945–959.Google Scholar PubMed

Rabiner, L. R. 1990. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Readings in Speech Recognition, 53(3), 267–296.CrossRef Google Scholar

Scott, S. L. 2002. Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. Journal of the American Statistical Association, 97(457), 337–352.CrossRef Google Scholar

Smola, A., and Narayanamurthy, S. 2010. An Architecture for Parallel TopicModels. Pages 703–710 at: Very Large Databases (VLDB).

Smyth, P., Heckerman, D., and Jordan, M. I. 1997. Probabilistic Independence Networks for Hidden Markov Probability Models. Neural Computation, 9(2), 227–269.CrossRef Google Scholar PubMed

Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. 2006. Hierarchical Dirichlet Processes. Journal of the American Statistical Association, 101(476), 1566–1581.CrossRef Google Scholar

Teh, Y. W., Newman, D., and Welling, M. 2007. A Collapsed Variational Bayesian Inference Algorithm for Latent DIrichlet Allocation. Pages 1353–1360 of: Advances in Neural Information Processing Systems 19. Cambridge, MA: MIT Press.Google Scholar

Wallach, H. M. 2006. Topic Modeling: Beyond Bag-of-Words. Pages 977–984 of: ICML'06: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM.CrossRef Google Scholar

Wang, Y., Bai, H., Stanton, M., Chen, W. Y., and Chang, E. Y. 2009. PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications. Pages 301–314 of: AAIM'09: Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management. Berlin: Springer.

Welling, M., Teh, Y. W., and Kappen, H. 2008a. Hybrid Variational/Gibbs Collapsed Inference in Topic Models. Pages 587–594 of: Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence (UAI-08). Corvallis, OR: AUAI Press.Google Scholar

Welling, M., Porteous, I., and Bart, E. 2008b. Infinite State Bayes-nets for Structured Domains. Pages 1601–1608 of: Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press.Google Scholar

Wolfe, J., Haghighi, A., and Klein, D. 2008. Fully Distributed EM for Very Large Datasets. Pages 1184–1191 of: ICML'08: Proceedings of the 25th International Conference on Machine Learning. New York: ACM.CrossRef Google Scholar

Yan, F., Xu, N., and Qi, Y. 2009. Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units. Pages 2134–2142 of: Advances in Neural Information Processing Systems 22.Google Scholar

Yao, L., Mimno, D., and McCallum, A. 2009. Efficient Methods for Topic Model Inference on Streaming Document Collections. Pages 937–946 of: KDD '09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM.CrossRef Google Scholar

Book contents

11 - Distributed Gibbs Sampling for Latent Variable Models

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive