The previous chapter explored the issue of model complexity, and focused on the fact that a model may give a good fit to a set of data simply by virtue of being more flexible. We also discussed several methods by which we can correct for complexity to obtain an unbiased measure of the fit of a model, in particular the AIC as a corrected estimator of the distance between the data and the “true” model. We pick up on both of these themes in this chapter, in which we discuss the Bayesian approach to model comparison, and how the Bayesian approach naturally accounts for model complexity.
We begin by presenting the core component of Bayesian model comparison – the marginal likelihood – and discuss how the relative fit of two models can be expressed in terms of Bayes factors. We then survey several methods for calculating the marginal likelihood, before discussing the particular role of the prior distributions when performing Bayesian model comparison.
Marginal Likelihoods and Bayes Factors
To understand Bayes factors, it is useful to first remind ourselves of Bayes theorem, as it applies to Bayesian parameter estimation (Equation 6.6). For convenience, we restate the theorem here:
For much of our discussion thus far, we focussed on the proportional relationship P(θ
|y) ∝ P(y|
θ)P(θ), and dropped P(y) as it can be treated as a normalising constant (Equation 7.1). It turns out that P(y) – the marginal likelihood – plays a critical role in Bayesian model comparison. Indeed, it is also called the evidence because it quantifies the evidence the data y provide for the model.
To understand why, we need to remind ourselves that all of the components in Equation 11.1 are conditional on a particular model M; that is, the equation should really be presented as:
Accordingly, the evidence P(y|M) tells us about the probability of obtaining data y under model M, and thus how consistent the data are with the model.
It might not seem obvious how we can calculate P(y|M), but all the information we need is already used in Equation 11.2. The evidence is obtained by calculating the marginal likelihood of the data given the model:
Effectively, what we are doing is considering how likely the data are for each point in the parameter space, and then averaging the resulting values.