Inference and learning in latent Markov models

D. Barber; S. Chiappa

doi:10.1017/CBO9781139941433.003

Inference and learning in latent Markov models

Published online by Cambridge University Press: 05 October 2015

D. Barber and

S. Chiappa

Edited by

Zhe Chen

Show author details

D. Barber: Affiliation:
University College London
S. Chiappa: Affiliation:
University of Cambridge
Zhe Chen: Affiliation:
New York University

Book contents

Get access

Summary

Probabilistic time series models

A time series is an ordered collection of observations y1:T ≡ {y1, …, yT}. Typical tasks in time series analysis are the prediction of future observations (for example in weather forecasting) or the extraction of lower-dimensional information embedded in the observations (for example in automatic speech recognition). In neuroscience, common applications are related to the latter, for example the detection of epileptic events or artifacts from EEG recordings (Boashash & Mesbah 2001; Rohalova et al. 2001; Tarvainen et al. 2004; Chiappa & Barber 2005, 2006), or the detection of intention in a collection of neural recordings for the purpose of control (Wu et al. 2003). Time series models commonly make the assumptions that the recent past is more informative than the distant past and that the observations are obtained from a noisy measuring device or from an inherent stochastic system. Often, in models of physical systems, additional knowledge about the properties of the time series are built into the model, including any known physical laws or constraints; other forms of prior knowledge may relate to whether the process underlying the time series is discrete or continuous. Markov models are classical models which allow one to build in such assumptions within a probabilistic framework.

A graphical depiction

A probabilistic model of a time series y1:T is a joint distribution p(y1:T). Commonly, the structure of the model is chosen to be consistent with the causal nature of time. This is achieved with Bayes' rule, which states that the distribution of the variable x, given knowledge of the state of the variable y, is given by p(x|y) = p(x, y)/p(y), see for example Barber (2012). Here p(x, y) is the joint distribution of x and y, while p(y) = ∫ p(x, y)dx is the marginal distribution of y (i.e., the distribution of y not knowing the state of x).

Type: Chapter
Information: Advanced State Space Methods for Neural and Clinical Data , pp. 14 - 50

DOI: https://doi.org/10.1017/CBO9781139941433.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abbott, L. F., Varela, J. A., Sen, K. & Nelson, S. B. (1997). Synaptic depression and cortical gain control. Science 275, 220–223.Google Scholar

Alspach, D. L. & Sorenson, H. W. (1972). Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Transactions on Automatic Control 17(4), 439–448.Google Scholar

Anderson, B. D. & Moore, J. B. (1979). Optimal Filtering, New Jersey: Prentice-Hall.

Barber, D. (2003a). Dynamic Bayesian networks with deterministic tables. In S., Becker, S., Thrun & K., Obermayer, eds, Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press, pp. 713–720.

Barber, D. (2003b). Learning in spiking neural assemblies. In S., Becker, S., Thrun & K., Obermayer, eds, Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press, pp. 149–156.

Barber, D. (2006). Expectation correction for an augmented class of switching linear Gaussian models. Journal of Machine Learning Research 7, 2515–2540.Google Scholar

Barber, D. (2012). Bayesian Reasoning and Machine Learning, Cambridge: Cambridge University Press.

Barber, D. & Cemgil, A. T. (2010). Graphical models for time series. IEEE Signal Processing Magazine 27(6), 18–28.Google Scholar

Barber, D. & Chiappa, S. (2007). Bayesian linear Gaussian state models for biosignal composition. IEEE Signal Processing Letters 14(4), 267–270.Google Scholar

Boashash, B. & Mesbah, M. (2001). A time-frequency approach for newborn seizure detection. IEEE Engineering in Medicine and Biology Magazine 20, 54–64.Google Scholar

Bracegirdle, C. & Barber, D. (2011). Switch-reset models: exact and approximate inference. In Proceedings of The Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar

Cappé, O., Godsill, S. J. & Moulines, E. (2007). An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE 95(5), 899–924.Google Scholar

Cappé, O., Moulines, E. & Ryden, T. (2005). Inference in Hidden Markov Models, New York: Springer.

Challis, E. & Barber, D. (2011). Concave Gaussian variational approximations for inference in large-scale Bayesian linear models. In AISTATS–JMLR Proceedings, pp. 199–207.Google Scholar

Chiappa, S. (2014). Explicit-duration Markov switching models. Foundations and Trends in Machine Learning 7(6), 803–886.Google Scholar

Chiappa, S. & Barber, D. (2005). Generative temporal ICA for classification in asynchronous BCI systems. In Proceedings of International Conference on Neural Engineering.Google Scholar

Chiappa, S. & Barber, D. (2006). EEG classification using generative independent component analysis. Neurocomputing 69, 769–777.Google Scholar

Chiappa, S. & Barber, D. (2007). Bayesian linear Gaussian state space models for biosignal decomposition. IEEE Signal Processing Letters 14(4), 267–270.Google Scholar

Churchland, P. S. & Sejnowski, T. J. (1994). The Computational Brain, Cambridge, MA: MIT Press.

Dayan, P. & Abbott, L. (2001). Theoretical Neuroscience, Cambridge, MA: MIT Press.

Dempster, A., Laird, N. & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Series B 39(1), 1–38.Google Scholar

Doucet, A., Godsill, S. & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statics and Computing 10(3), 197–208.Google Scholar

Doucet, A. & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: fifteen years later. In D., Crisan & B., Rozovsky, eds, Oxford Handbook of Nonlinear Filtering, Oxford: Oxford University Press.

Engel, R. F. (2001). GARCH 101: the use of ARCH/GARCH models in applied econometrics. Journal of Economic Perspectives 15(4), 157–168.Google Scholar

Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models, New York: Springer.

Gerstner, W. & Kistler, W. M. (2002). Spiking Neuron Models, Cambridge: Cambridge University Press.

Ghahramani, Z. & Hinton, G. E. (1998). Variational learning for switching state-space models. Neural Computation 12(4), 963–996.Google Scholar

Ghahramani, Z. & Jordan, M. I. (1995). Factorial hidden Markov models. In D. S., Touretzky, M. C., Mozer & M. E., Hasselmo, eds, Advances in Neural Information Processing Systems 8, Cambridge, MA: MIT Press, pp. 472–478.

Godsill, S. J., Doucet, A. & West, M. (2004). Monte Carlo smoothing for non-linear time series. Journal of the American Statistical Association 99, 156–168.Google Scholar

Hertz, J., Krogh, A. & Palmer, R. G. (1991). Introduction to the Theory of Neural Computation, Reading, MA: Addison-Wesley Publishing Company.

Hughes, N., Roberts, S. & Tarassenko, L. (2004). Semi-supervised learning of probabilistic models for ECG segmentation. In Proceedings of IEEE Engineering in Medicine and Biology Society (EMBC), pp. 434–437.Google Scholar

Isard, M. & Blake, A. (1998). CONDENSATION: conditional density propagation for visual tracking. International Journal of Computer Vision 29, 5–28.Google Scholar

Kailath, T., Sayed, A. H. & Hassibi, B. (2000). Linear Estimation, Englewood Cilffs, NJ: Prentice Hall.

Kim, C. -J. (1994). Dynamic linear models with Markov-switching. Journal of Econometrics 60, 1–22.Google Scholar

Kim, C. -J. & Nelson, C. R. (1999). State-Space Models with Regime Switching, Cambridge, MA: MIT Press.

Kitagawa, G. (1994). The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother. Annals of the Institute of Statistical Mathematics 46(4), 605–623.Google Scholar

Lerner, U., Parr, R., Koller, D. & Biswas, G. (2000). Bayesian fault detection and diagnosis in dynamic systems. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 531–537.Google Scholar

Markram, H., Lubke, J., Frotscher, M. & Sakmann, B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275, 213–215.Google Scholar

Minka, T. (2001). Expectation propagation for approximate Bayesian inference. PhD thesis, MIT.

Neal, R. M. (1993). Probabilistic inference using Markov ChainMonte Carlo methods. CRG-TR-93-1, Dept. of Computer Science, University of Toronto.

Pfister, J. -P., Toyiozumi, T., Barber, D. & Gerstner, W. (2006). Optimal spike-timing dependent plasticity for precise action potential firing in supervised learning. Neural Computation 18, 1309–1339.Google Scholar

Quinn, J. A. & Williams, C. K. I. (2011). Physiological monitoring with factorial switching linear dynamical systems. In D., Barber, A. T., Cemgil & S., Chiappa, eds, Bayesian Time Series Models, Cambridge: Cambridge University Press.

Quinn, J. A., Williams, C. K. I. & McIntosh, N. (2009). Factorial switching linear dynamical systems applied to physiological condition monitoring. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 1537–1551.Google Scholar

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286.Google Scholar

Rauch, H. E., Tung, G. & Striebel, C. T. (1965). Maximum likelihood estimates of linear dynamic systems. American Institute of Aeronautics and Astronautics Journal 3(8), 1445–1450.Google Scholar

Rohalova, M., Sykacek, P., Koska, M. & Dorffner, G. (2001). Detection of the EEG artifacts by the means of the (extended) Kalman filter. Measurement Science Review 1(1), 59–62.Google Scholar

Rubin, D. B. (1988). Using the SIR algorithm to simulate posterior distributions. In M. H., Bernardo, K. M., Degroot, D. V., Lindley & A. F. M., Smith, eds, Bayesian Statistics 3, Oxford: Oxford University Press.

Tarvainen, M. P., Hiltunen, J. K., Ranta-aho, P. O. & Karjalainen, P. A. (2004). Estimation of nonstationary EEG with Kalman smoother approach: an application to event-related synchronization (ERS). IEEE Transactions on Biomedical Engineering 51(3), 516–524.Google Scholar

Tsodyks, M., Pawelzik, K. & Markram, H. (1998). Neural networks with dynamic synapses. Neural Computation 10, 821–835.Google Scholar

Verhaegen, M. & Van Dooren, P. (1986). Numerical aspects of different Kalman filter implementations. IEEE Transactions of Automatic Control 31(10), 907–917.Google Scholar

Wainwright, M. & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1, 1–305.Google Scholar

West, M. & Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd edn, New York: Springer-Verlag.

Wu, W., Serruya, M., Black, M. J., Gao, Y., Shaikhouni, A., Bienenstock, E. & Donoghue, J. P. (2003). Neural decoding of cursor motion using a Kalman filter. In S., Becker, S., Thrun & K., Obermayer, eds, Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press, pp. 133–140.

Book contents

Inference and learning in latent Markov models

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive