Given a sequence S and a collection Ω of d words, it is of interest in many applications to characterize the multivariate distribution of the vector of counts
1), …, N(S,w
)), where N(S,w) is the number of times a word w ∈ Ω appears in the sequence S. We obtain an explicit bound on the error made when approximating the multivariate distribution of
by the normal distribution, when the underlying sequence is i.i.d. or first-order stationary Markov over a finite alphabet. When the limiting covariance matrix of
is nonsingular, the error bounds decay at rate
((log n) / √n) in the i.i.d. case and
((log n)3 / √n) in the Markov case. In order for
to have a nondegenerate covariance matrix, it is necessary and sufficient that the counted word set Ω is not full, that is, that Ω is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, we use a version of Stein's method.