Book contents
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
6 - Asymptotic approximation
from Part II - Approximate inference
Published online by Cambridge University Press: 05 August 2015
- Frontmatter
- Contents
- Preface
- Notation and abbreviations
- Part I General discussion
- Part II Approximate inference
- 4 Maximum a-posteriori approximation
- 5 Evidence approximation
- 6 Asymptotic approximation
- 7 Variational Bayes
- 8 Markov chain Monte Carlo
- Appendix A Basic formulas
- Appendix B Vector and matrix formulas
- Appendix C Probabilistic distribution functions
- References
- Index
Summary
Asymptotic approximation is also well known in practical Bayesian approaches (De Bruijn 1970) for approximately obtaining the posterior distributions. For example, as we discussed in Chapter 2, the posterior distributions of a model parameter p(Θ|O) and a model p(M|O) given an observation O = ﹛ot {ℝD|t = 1, …, T}) are usually difficult to solve. The approach assumes that we have enough data (i.e., T is sufficiently large), which also makes Bayesian inference mathematically tractable. As a particular example of asymptotic approximations, we introduce the Laplace approximation and Bayesian information criterion, which are widely used for speech and language processing.
The Laplace approximation is used to approximate a complex distribution as a Gaussian distribution (Kass & Raftery 1995, Bernardo & Smith 2009). It assumes that the posterior distribution is highly peaked at about its maximum value, which corresponds to the mode of the posterior distribution. Then the posterior distribution is modeled as a Gaussian distribution with the mode as a mean parameter. By using the approximation, we can obtain the posterior distributions analytically to some extent. Section 6.1 first explains the Laplace approximation in general. In Sections 6.3 and 6.4 we also discuss use of the Laplace approximation for analytically obtaining Bayesian predictive distributions for acoustic modeling and Bayesian extension of successful neural-network-based acoustic modeling, respectively.
Another example of this asymptotic approximation is the Bayesian information criterion (or Schwarz criterion (Schwarz 1978)). The Bayesian information criterion also assumes the large sample case, and approximates the posterior distribution of a model p(M|O) with a simple equation. Since the Bayesian information criterion assumes the large sample case, it is also described as an instance of asymptotic approximations. Section 6.2 explains the Bayesian information criterion in general; it is used for model selection problems in speech processing. For example, Section 6.5 discusses the optimization of an appropriate model structure of hidden Markov models, and Section 6.6 discusses estimation of the number of speakers and detecting speech segments in conversations by regarding these approaches as model selection problems.
Laplace approximation
This section first describes the basic theory of the Laplace approximation. We first consider a simple case where a model does not have a latent variable. We focus on the posterior distributions of model parameters, but this approximation can be applied to the other continuous probabilistic variables.
- Type
- Chapter
- Information
- Bayesian Speech and Language Processing , pp. 211 - 241Publisher: Cambridge University PressPrint publication year: 2015