Markov chain Monte Carlo algorithms for Gaussian processes

Michalis K. Titsias; Magnus Rattray; Neil D. Lawrence

doi:10.1017/CBO9780511984679.015

14 - Markov chain Monte Carlo algorithms for Gaussian processes

from V - Nonparametric models

Published online by Cambridge University Press: 07 September 2011

Michalis K. Titsias ,

Magnus Rattray and

Edited by

A. Taylan Cemgil and

Michalis K. Titsias: Affiliation:
University of Manchester
Magnus Rattray: Affiliation:
University of Manchester
Neil D. Lawrence: Affiliation:
University of Manchester
David Barber: Affiliation:
University College London
A. Taylan Cemgil: Affiliation:
Boğaziçi Üniversitesi, Istanbul
Silvia Chiappa: Affiliation:
University of Cambridge

Book contents

Get access

Summary

Introduction

Gaussian processes (GPs) have a long history in statistical physics and mathematical probability. Two of the most well-studied stochastic processes, Brownian motion [12, 47] and the Ornstein–Uhlenbeck process [43], are instances of GPs. In the context of regression and statistical learning, GPs have been used extensively in applications that arise in geostatistics and experimental design [26, 45, 7, 40]. More recently, in the machine learning literature, GPs have been considered as general estimation tools for solving problems such as non-linear regression and classification [29]. In the context of machine learning, GPs offer a flexible nonparametric Bayesian framework for estimating latent functions from data and they share similarities with neural networks [23] and kernel methods [35].

In standard GP regression, where the likelihood is Gaussian, the posterior over the latent function (given data and hyperparameters) is described by a new GP that is obtained analytically. In all other cases, where the likelihood function is non-Gaussian, exact inference is intractable and approximate inference methods are needed. Deterministic approximate methods are currently widely used for inference in GP models [48, 16, 8, 29, 19, 34]. However, they are limited by an assumption that the likelihood function factorises. In addition, these methods usually treat the hyperparameters of the model (the parameters that appear in the likelihood and the kernel function) in a non full Bayesian way by providing only point estimates.

Type: Chapter
Information: Bayesian Time Series Models , pp. 295 - 316

DOI: https://doi.org/10.1017/CBO9780511984679.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] P., Abrahamsen. A review of Gaussian random fields and correlation functions. Technical Report 917, Norwegian Computing Center, 1997.

[2] R. P., Adams, I., Murray and D. J. C., MacKay. The Gaussian process density sampler. In D., Koller, D., Schuurmans, Y., Bengio and L., Bottou, editors, Advances in Neural Information Processing Systems 21, pages 9–16. 2009.Google Scholar

[3] U., Alon. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC, 2006.Google Scholar

[4] C., Andrieu and J., Thoms. A tutorial on adaptive MCMC. Statistics and Computing, 18:343–373, 2008.Google Scholar

[5] M., Barenco, D., Tomescu, D., Brewer, J., Callard, R., Stark and M., Hubank. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3), 2006.Google Scholar

[6] O. F., Christensen, G. O., Roberts and Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics, 15:1–17, 2006.Google Scholar

[7] N. A. C., Cressie. Statistics for Spatial Data. John Wiley & Sons, 1993.Google Scholar

[8] L., Csato and M., Opper. Sparse online Gaussian processes. Neural Computation, 14:641–668, 2002.Google Scholar

[9] P. J., Diggle, J. A., Tawn and R. A., Moyeed. Model-based Geostatistics (with discussion). Applied Statistics, 47:299–350, 1998.Google Scholar

[10] J. L., Doob. Stochastic Processes. John Wiley & Sons, 1953.Google Scholar

[11] S., Duane, A. D., Kennedy, B. J., Pendleton and D., Roweth. Hybrid Monte Carlo. Physics Letters B, 95(2):216–222, 1987.Google Scholar

[12] A., Einstein. On the movement of small particles suspended in a stationary liquid by the molecular kinetic theory of heat. Dover Publications, 1905.Google Scholar

[13] P., Gao, A., Honkela, N., Lawrence and M., Rattray. Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities. In ECCB08, 2008.Google Scholar

[14] A., Gelman, J., Carlin, H., Stern and D., Rubin. Bayesian Data Analysis. Chapman and Hall, 2004.Google Scholar

[15] A., Gelman, G. O., Roberts and W. R., Gilks. Efficient metropolis jumping rules. In Bayesian statistics, 5, 1996.Google Scholar

[16] M. N., Gibbs and D. J. C., MacKay. Variational Gaussian process classifiers. IEEE Transactions on Neural Networks, 11(6):1458–1464, 2000.Google Scholar

[17] W. R., Gilks and P., Wild. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41(2):337–348, 1992.Google Scholar

[18] H., Haario, E., Saksman and J., Tamminen. An adaptive metropolis algorithm. Bernoulli, 7:223–240, 2001.Google Scholar

[19] M., Kuss and C. E., Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6:1679–1704, 2005.Google Scholar

[20] N. D., Lawrence, G., Sanguinetti and M., Rattray. Modelling transcriptional regulation using Gaussian processes. In Advances in Neural Information Processing Systems, 19. MIT Press, 2007.Google Scholar

[21] N. D., Lawrence, M., Seeger and R., Herbrich. Fast sparse Gaussian process methods: the informative vector machine. In Advances in Neural Information Processing Systems, 13. MIT Press, 2002.Google Scholar

[22] T., Minka. Expectation propagation for approximate Bayesian inference. In UAI, pages 362–369, 2001.Google Scholar

[23] R. M., Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics 118. Springer, 1996.Google Scholar

[24] R. M., Neal. Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report, Dept. of Statistics, University of Toronto, 1997.Google Scholar

[25] R. M., Neal. Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In M. I., Jordan, editor, Learning in Graphical Models, pages 205–225. Kluwer Academic Publishers, 1998.Google Scholar

[26] A., O'Hagan. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society, Series B, 40(1):1–42, 1978.Google Scholar

[27] M., Opper and C., Archambeau. The variational Gaussian approximation revisited. Neural Computation, 21(3), 2009.Google Scholar

[28] J., Quiñonero Candela and C. E., Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1939–1959, 2005.Google Scholar

[29] C. E., Rasmussen and C. K. I., Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar

[30] C. P., Robert and G., Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2nd edition, 2004.Google Scholar

[31] G. O., Roberts, A., Gelman and W. R., Gilks. Weak convergence and optimal scaling of random walk metropolis algorithms. Annals of Applied Probability, 7:110–120, 1996.Google Scholar

[32] S., Rogers, R., Khanin and M., Girolami. Bayesian model-based inference of transcription factor activity. BMC Bioinformatics, 8(2), 2006.Google Scholar

[33] H., Rue and L., Held. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability. Chapman & Hall, 2005.Google Scholar

[34] H., Rue, S., Martino and N., Chopin. Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 71(2):319–392, 2009.Google Scholar

[35] B., Schölkopf and A., Smola. Learning with Kernels. MIT Press, 2002.Google Scholar

[36] M., Seeger. Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh, July 2003.

[37] M., Seeger, C. K. I., Williams and N. D., Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In C. M., Bishop and B. J., Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence. MIT Press, 2003.Google Scholar

[38] A. J., Smola and P., Bartlett. Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems, 13. MIT Press, 2001.Google Scholar

[39] E., Snelson and Z., Ghahramani. Sparse Gaussian process using pseudo inputs. In Advances in Neural Information Processing Systems, 13. MIT Press, 2006.Google Scholar

[40] M. L., Stein. Interpolation of Spatial Data. Springer, 1999.Google Scholar

[41] M. K., Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Twelfth International Conference on Artificial Intelligence and Statistics, JMLR: W and CP, volume 5, pages 567–574, 2009.Google Scholar

[42] M. K., Titsias, N. D., Lawrence and M., Rattray. Efficient sampling for Gaussian process inference using control variables. In D., Koller, D., Schuurmans, Y., Bengio, and L., Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1681–1688. 2009.Google Scholar

[43] G. E., Uhlenbeck and L. S., Ornstein. On the theory of Brownian motion. Physics Review, 36:823–841, 1930.Google Scholar

[44] J., Vanhatalo and A., Vehtari. Sparse log Gaussian processes via MCMC for spatial epidemiology. Journal of Machine Learning Research: Workshop and conference proceedings, 1:73–89, 2007.Google Scholar

[45] G., Wahba. Spline models for observational data. Society for Industrial and Applied Mathematics, 59, 1990.Google Scholar

[46] M. C., Wang and G. E., Uhlenbeck. On the Theory of the Brownian motion II. Reviews of Modern Physics, 17(2-3):323–342, 1945.Google Scholar

[47] N., Wiener. Differential space. Journal of Mathematical Physics, 2:131–174, 1923.Google Scholar

[48] C. K. I., Williams and D., Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998.Google Scholar

Book contents

14 - Markov chain Monte Carlo algorithms for Gaussian processes

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive