We know from the preceding chapter that the goal of parameter estimation is to find those parameter values that maximize the agreement between the model's predictions and the data. The extent of that agreement then tells us something (though not everything!) about the utility of the model. Moreover, interpretation of those parameter values can often shed a light on the underlying processes. In this chapter we provide the basic set of tools necessary to achieve these goals.
Although we wish to maximise the similarity between the model predictions and the data, most parameter estimation procedures reframe this intention by instead minimizing the discrepancy between predictions and data.
Minimization requires a continuous discrepancy function that condenses the discrepancy between predictions and data into a single number. That discrepancy function is minimized by gradual and iterative adjustment of the parameters. The discrepancy function is also variously known as objective function, cost function, or error function, and we will consider a few such functions along the way.
To illustrate, consider Figure 3.1, which presents data from one condition in a forgetting experiment reported by Carpenter et al. (2008). In the experiment, participants studied a set of 60 obscure facts (e.g., “greyhounds have the best eyesight of any dog”), and their memory for those facts was tested after 5 minutes, and again 1, 2, 7, 14, 42 days later. A different subset of items was tested at each retention interval. The figure also shows the best-fitting predictions of a model of forgetting based on a power function. We first encountered the power function in Chapter 1 in connection with practice effects. As we foreshadowed there, the power function is also a good way to characterize forgetting. Carpenter et al. (2008) used this form of the power function:
Fitting Models to Data: Parameter Estimation Techniques
How do we minimize the discrepancy function? A number of competing approaches exist, and we will discuss them throughout the remainder of the book. The first two approaches are known as least-squares and maximum likelihood estimation, respectively, and this chapter and the next one is devoted to presenting them. The third approach, which involves application of Bayesian statistics, will be discussed later in Chapters 6 through 9. Although the mechanics of least-squares and maximum likelihood estimation are quite similar, their underlying motivation and properties differ considerably.