The goal of this chapter is to introduce the methods that can be applied in cases when the posterior distribution is not analytically tractable. Consider a Beta distribution with parameters α and β. We know from Equation 6.13 that its mean is α/ (α +β). But what if we did not know that equation? We could still obtain the mean of a Beta distribution by drawing a huge sample from that distribution in R and computing its mean using the single command mean(rbeta(n,alpha,beta)), where n would be a suitably large number. As long as we can sample from a distribution, given a sufficiently large sample we can begin to understand many attributes of that distribution even if we do not have an explicit formula for it. This general sampling approach, where an unknown distribution is approximated by a large sample, underlies all Monte Carlo methods of Bayesian inference, although there is considerable variation between methods in assumptions about what we do and do not know about the underlying distributions. Table 7.1 summarizes and compares the main approaches to Bayesian inference and parameter estimation that are presented in this chapter. The figure clarifies that the methods from the previous chapter can only be applied with complete knowledge of all distributions involved. When that knowledge is limited, we can resort to the sampling methods outlined in this chapter.
Markov Chain Monte Carlo Methods
The basic idea is simple: we want to replace the unknown posterior distribution with a large sample whose properties mirror those of the posterior, and that we can conveniently interpret to answer questions about the model parameters. How do we obtain this sample if the posterior is not analytically tractable? The answer is embedded in Bayes’ rule provided by Equations 6.9 and 6.20 in the previous chapter. While we may be unable to write down an equation for the posterior and solve it for the parameters, θ, we still have access to the right-hand side of the equation. In particular, we necessarily have knowledge of the prior distribution of parameters because we commence our inference by formulating those assumptions. We also typically have knowledge of the likelihood of our observed data given the parameters (though Section 7.3 will relax even that).