A controlled Markov chain with finite state space has transition probabilities which depend on an unknown parameter α lying in a known finite set A. For each α, a stationary control law ϕ α
is given. This paper develops a control scheme whereby at each stage t a parameter α t
is chosen at random from among those parameters which nearly maximize the log likelihood function, and the control ut
is chosen according to the control law ϕ αt. It is proved that this algorithm leads to identification of the true α under conditions weaker than any previously considered.