As discussed in Chapter 1, the EM algorithm is an iterative procedure used to obtain maximum likelihood estimates (MLEs) for the parameters of statistical models which are induced by a hidden variable construct, such as the hidden Markov model (HMM). The tree structure underlying the HMM allows us to organize the required computations efficiently, which leads to an efficient implementation of the EM algorithm for HMMs known as the Baum–Welch algorithm. For several examples of two-state HMMs with binary output we plot the likelihood function and relate the paths taken by the EM algorithm to the gradient of the likelihood function.
The EM algorithm for hidden Markov models
The hidden Markov model is obtained from the fully observed Markov model by marginalization; see Sections 1.4.2 and 1.4.3. We will use the same notation as there, so σ = σ1σ2 … σn ∈ Σn is a sequence of states and τ = τ1τ2 … τn ∈ (Σ′)n a sequence of output variables. We assume that we observe N sequences, τ1, τ2, …, τN ∈ (Σ′)n, each of length n but that the corresponding state sequences, σ1, σ2, …, σN ∈ Σn, are not observed (hidden).
In Section 1.4.2 it is assumed that there is a uniform distribution on the first state in each sequence, i.e., Prob(σ1 = r) = 1/l for each r ∈ Σ where l = |Σ|.