Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-qxdb6 Total loading time: 0 Render date: 2024-04-25T10:39:02.414Z Has data issue: false hasContentIssue false

14 - Markov chain Monte Carlo algorithms for Gaussian processes

from V - Nonparametric models

Published online by Cambridge University Press:  07 September 2011

Michalis K. Titsias
Affiliation:
University of Manchester
Magnus Rattray
Affiliation:
University of Manchester
Neil D. Lawrence
Affiliation:
University of Manchester
David Barber
Affiliation:
University College London
A. Taylan Cemgil
Affiliation:
Boğaziçi Üniversitesi, Istanbul
Silvia Chiappa
Affiliation:
University of Cambridge
Get access

Summary

Introduction

Gaussian processes (GPs) have a long history in statistical physics and mathematical probability. Two of the most well-studied stochastic processes, Brownian motion [12, 47] and the Ornstein–Uhlenbeck process [43], are instances of GPs. In the context of regression and statistical learning, GPs have been used extensively in applications that arise in geostatistics and experimental design [26, 45, 7, 40]. More recently, in the machine learning literature, GPs have been considered as general estimation tools for solving problems such as non-linear regression and classification [29]. In the context of machine learning, GPs offer a flexible nonparametric Bayesian framework for estimating latent functions from data and they share similarities with neural networks [23] and kernel methods [35].

In standard GP regression, where the likelihood is Gaussian, the posterior over the latent function (given data and hyperparameters) is described by a new GP that is obtained analytically. In all other cases, where the likelihood function is non-Gaussian, exact inference is intractable and approximate inference methods are needed. Deterministic approximate methods are currently widely used for inference in GP models [48, 16, 8, 29, 19, 34]. However, they are limited by an assumption that the likelihood function factorises. In addition, these methods usually treat the hyperparameters of the model (the parameters that appear in the likelihood and the kernel function) in a non full Bayesian way by providing only point estimates.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] P., Abrahamsen. A review of Gaussian random fields and correlation functions. Technical Report 917, Norwegian Computing Center, 1997.
[2] R. P., Adams, I., Murray and D. J. C., MacKay. The Gaussian process density sampler. In D., Koller, D., Schuurmans, Y., Bengio and L., Bottou, editors, Advances in Neural Information Processing Systems 21, pages 9–16. 2009.Google Scholar
[3] U., Alon. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC, 2006.Google Scholar
[4] C., Andrieu and J., Thoms. A tutorial on adaptive MCMC. Statistics and Computing, 18:343–373, 2008.Google Scholar
[5] M., Barenco, D., Tomescu, D., Brewer, J., Callard, R., Stark and M., Hubank. Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biology, 7(3), 2006.Google Scholar
[6] O. F., Christensen, G. O., Roberts and Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics, 15:1–17, 2006.Google Scholar
[7] N. A. C., Cressie. Statistics for Spatial Data. John Wiley & Sons, 1993.Google Scholar
[8] L., Csato and M., Opper. Sparse online Gaussian processes. Neural Computation, 14:641–668, 2002.Google Scholar
[9] P. J., Diggle, J. A., Tawn and R. A., Moyeed. Model-based Geostatistics (with discussion). Applied Statistics, 47:299–350, 1998.Google Scholar
[10] J. L., Doob. Stochastic Processes. John Wiley & Sons, 1953.Google Scholar
[11] S., Duane, A. D., Kennedy, B. J., Pendleton and D., Roweth. Hybrid Monte Carlo. Physics Letters B, 95(2):216–222, 1987.Google Scholar
[12] A., Einstein. On the movement of small particles suspended in a stationary liquid by the molecular kinetic theory of heat. Dover Publications, 1905.Google Scholar
[13] P., Gao, A., Honkela, N., Lawrence and M., Rattray. Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities. In ECCB08, 2008.Google Scholar
[14] A., Gelman, J., Carlin, H., Stern and D., Rubin. Bayesian Data Analysis. Chapman and Hall, 2004.Google Scholar
[15] A., Gelman, G. O., Roberts and W. R., Gilks. Efficient metropolis jumping rules. In Bayesian statistics, 5, 1996.Google Scholar
[16] M. N., Gibbs and D. J. C., MacKay. Variational Gaussian process classifiers. IEEE Transactions on Neural Networks, 11(6):1458–1464, 2000.Google Scholar
[17] W. R., Gilks and P., Wild. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41(2):337–348, 1992.Google Scholar
[18] H., Haario, E., Saksman and J., Tamminen. An adaptive metropolis algorithm. Bernoulli, 7:223–240, 2001.Google Scholar
[19] M., Kuss and C. E., Rasmussen. Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6:1679–1704, 2005.Google Scholar
[20] N. D., Lawrence, G., Sanguinetti and M., Rattray. Modelling transcriptional regulation using Gaussian processes. In Advances in Neural Information Processing Systems, 19. MIT Press, 2007.Google Scholar
[21] N. D., Lawrence, M., Seeger and R., Herbrich. Fast sparse Gaussian process methods: the informative vector machine. In Advances in Neural Information Processing Systems, 13. MIT Press, 2002.Google Scholar
[22] T., Minka. Expectation propagation for approximate Bayesian inference. In UAI, pages 362–369, 2001.Google Scholar
[23] R. M., Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics 118. Springer, 1996.Google Scholar
[24] R. M., Neal. Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report, Dept. of Statistics, University of Toronto, 1997.Google Scholar
[25] R. M., Neal. Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In M. I., Jordan, editor, Learning in Graphical Models, pages 205–225. Kluwer Academic Publishers, 1998.Google Scholar
[26] A., O'Hagan. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society, Series B, 40(1):1–42, 1978.Google Scholar
[27] M., Opper and C., Archambeau. The variational Gaussian approximation revisited. Neural Computation, 21(3), 2009.Google Scholar
[28] J., Quiñonero Candela and C. E., Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1939–1959, 2005.Google Scholar
[29] C. E., Rasmussen and C. K. I., Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar
[30] C. P., Robert and G., Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2nd edition, 2004.Google Scholar
[31] G. O., Roberts, A., Gelman and W. R., Gilks. Weak convergence and optimal scaling of random walk metropolis algorithms. Annals of Applied Probability, 7:110–120, 1996.Google Scholar
[32] S., Rogers, R., Khanin and M., Girolami. Bayesian model-based inference of transcription factor activity. BMC Bioinformatics, 8(2), 2006.Google Scholar
[33] H., Rue and L., Held. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability. Chapman & Hall, 2005.Google Scholar
[34] H., Rue, S., Martino and N., Chopin. Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B: Statistical Methodology, 71(2):319–392, 2009.Google Scholar
[35] B., Schölkopf and A., Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
[36] M., Seeger. Bayesian Gaussian process models: PAC-Bayesian generalisation error bounds and sparse approximations. PhD thesis, University of Edinburgh, July 2003.
[37] M., Seeger, C. K. I., Williams and N. D., Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In C. M., Bishop and B. J., Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence. MIT Press, 2003.Google Scholar
[38] A. J., Smola and P., Bartlett. Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems, 13. MIT Press, 2001.Google Scholar
[39] E., Snelson and Z., Ghahramani. Sparse Gaussian process using pseudo inputs. In Advances in Neural Information Processing Systems, 13. MIT Press, 2006.Google Scholar
[40] M. L., Stein. Interpolation of Spatial Data. Springer, 1999.Google Scholar
[41] M. K., Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Twelfth International Conference on Artificial Intelligence and Statistics, JMLR: W and CP, volume 5, pages 567–574, 2009.Google Scholar
[42] M. K., Titsias, N. D., Lawrence and M., Rattray. Efficient sampling for Gaussian process inference using control variables. In D., Koller, D., Schuurmans, Y., Bengio, and L., Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1681–1688. 2009.Google Scholar
[43] G. E., Uhlenbeck and L. S., Ornstein. On the theory of Brownian motion. Physics Review, 36:823–841, 1930.Google Scholar
[44] J., Vanhatalo and A., Vehtari. Sparse log Gaussian processes via MCMC for spatial epidemiology. Journal of Machine Learning Research: Workshop and conference proceedings, 1:73–89, 2007.Google Scholar
[45] G., Wahba. Spline models for observational data. Society for Industrial and Applied Mathematics, 59, 1990.Google Scholar
[46] M. C., Wang and G. E., Uhlenbeck. On the Theory of the Brownian motion II. Reviews of Modern Physics, 17(2-3):323–342, 1945.Google Scholar
[47] N., Wiener. Differential space. Journal of Mathematical Physics, 2:131–174, 1923.Google Scholar
[48] C. K. I., Williams and D., Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×