From the very beginning of the study of reinforcement, there has been conflict between two fundamentally different interpretations. The first, exemplified by Thorndike, is that reinforcement is essentially a very simple process. When Thorndike placed his cats in the puzzle box for the first time, they struggled frantically to escape and reach the food dish outside. Eventually, after 8–10 minutes of scrambling about, a cat might accidentally contact the release mechanism and escape. If the cat formed a rational appreciation of the situation, he argued, it should repeat this response immediately on subsequent trials:
If there were in these animals any power of inference, however rudimentary, however sporadic, however dim, there should have appeared among the multitude some cases when an animal, seeing through the situation, knows the proper act, does it, and from then on does it immediately upon being confronted with the situation. There ought, that is, to be a sudden vertical descent in the time-curve.(Thorndike, 1911, p. 73)
In all the scores of animals Thorndike tested, not once did he observe sudden and enduring improvement of this kind. In most instances, improvement over trials was a slow, gradual affair. (See Figure 4.2 for some representative records.)
To Thorndike, this gradual improvement in performance, with its occasional reversals and failures, did not at all resemble the behavior of a rational animal fully aware of the relationship between the latch and the door:
The gradual slope of the time-curve…shows the absence of reasoning. They represent the wearing smooth of a path in the brain, not the decisions of a rational consciousness.(Thorndike, 1911, p. 74)
Reinforcement, he concluded, caused the formation of an association between the rewarded response and the stimuli that were present at the time, so that thereafter these stimuli would elicit the response automatically.