This chapter focuses on basic statistical models (Gaussian mixture models (GMM), hidden Markov models (HMM),n–gram models and latent topic models), which are widely used in speech and language processing. These are well-known generative models, and these probabilistic models can generate speech and language features based on their likelihood functions. We also provide parameter-learning schemes based on maximum likelihood (ML) estimation which is derived according to the expectation and maximization (EM) algorithm (Dempster et al. 1976). Basically, the following chapters extend these statistical models from ML schemes to Bayesian schemes. These models are fundamental for speech and language processing.We specifically build an automatic speech recognition (ASR) system based on these models and extend them to deal with different problems in speaker clustering, speech verification, speech separation and other natural language processing systems.
In this chapter, Section 3.1 first introduces the probabilistic approach to ASR, which aims to find the most likely word sequence W corresponding to the input speech feature vectors O. Bayes decision theory provides a theoretical solution to build up a speech recognition system based on the posterior distribution of the word sequence p(W|O) given speech feature vectors O. Then the Bayes theorem decomposes the problem based on p(W|O) into two problems based on two generative models of speech features p(O|W) (acoustic model) and language features p(W) (language model), respectively. Therefore, the Bayes theorem changes the original problem to these two independent generative model problems.
Next, Section 3.2 introduces the HMM with the corresponding likelihood function as a generative model of speech features. The section first describes the discrete HMM, which has a multinomial distribution as a state observation distribution, and Section 3.2.4 introduces the GMM as a state observation distribution of the continuous density HMM for acoustic modeling. The GMM by itself is also used as a powerful statistical model for other speech processing approaches in the later chapters. Section 3.3 provides the basic algorithms of forward–backward and Viterbi algorithms. In Section 3.4, ML estimation of HMM parameters is derived according to the EM algorithm to deal with latent variables included in the HMM efficiently. Thus, we provide the conventional ML treatment of basic statistical models for acoustic models based on the HMM.