1.Algoet, P.H. & Cover, T.M. (1988). A sandwich proof of the Shannon–McMillan–Breiman theorem. Annals of Probability 16: 899–909.

2.Amari, S. (1985). Differential geometrical method in statistics. New York: Springer-Verlag.

3.Barron, A.R. (1985). The strong ergodic theorem for densities: Generalized Shannon-McMillan-Breiman theorem. Annals of Probability 13: 1292–1303.

4.Breiman, L. (1957). The individual ergodic theorem of information theory. The Annals of Mathematical Statistics 28: 809–811.

5.Bowerman, B, David, H.T., & Isaacson, D. (1977). The convergence of Cesaro averages for certain nonstationary Markov chains. Stochastic Process and their Applications 5: 221–230.

6.Chazottes, J.R., Giardina, C., & Redig, F. (2006). Relative entropy and waiting times for continuous time Markov processes. Electronic Journal of Probability 11: 1049–1068.

7.Chung, K.L. (1961). The ergodic theorem of information theory. The Annals of Mathematical Statistics 32: 612–614.

8.Csiszar, I. (1967). Information type measures of difference of probability distribution and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2: 299–318.

9.Gray, R. (2011). Entropy and information theory. 2nd ed. New York: Springer.

10.Hall, P. & Heyde, C.C. (1980). Martingale limit theory and application. New York: Academic Press.

11.Isaacson, D. & Madsen, R. (1976). Markov chains theory and applications. New York: Wiley.

12.Jia, C., Chen, D., & Lin, K. (2008). The application of the relative entropy density divergence in intrusion detection models. 2008 International Conference on Computer Science and Software Engineering, 951–954.

13.Kesidis, G. & Walrand, J. (1993). Relative entropy between Markov transition rate matrices. IEEE Transactions on Information Theory 39: 1056–1057.

14.Kieffer, J.C. (1974). A simple proof of the Moy-Perez generalization of the Shannon–McMillan–Breiman theorem. Pacific Journal of Mathematical 51: 203–204.

15.Kullback, S. & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics 22: 79–86.

16.Lai, J. & Ford, J.J. (2010). Relative entropy rate based Multiple Hidden Markov Model approximation. IEEE Transactions on Signal Processing 58(1): 165–174.

17.Ma, H.L. & Yang, W.G. (2011). Erratum to ‘The asymptotic equipartition property for asymptotic circular Markov chains’. Probability in the Engineering and Informational Sciences 25: 265–267.

18.McMillan, B. (1953). The basic theorems of information theory. The Annals of Mathematical Statistics 24: 196–219.

19.Ross, S. (1982). Stochastic processes. New York: Wiley.

20.Shannon, C. (1948). A mathematical theory of communication. Bell Systtem Technical 27: 379–423.

21.Wang, Z.Z. & Yang, W.G. (2016). The generalized entropy ergodic theorem for nonhomogeneous Markov chains. Journal of Theoretical Probability 29: 761–775.

22.Yang, W.G. (1998). The asymptotic equipartition property for nonhomogeneous Markov information sources. Probability in the Engineering and Informational Sciences 12: 509–518.

23.Yang, W.G. & Liu, W. (2004). The asymptotic equipartition property for mth-order nonhomogeneous Markov information sources. IEEE Transactions on Information Theory 50(12): 3326–3330.

24.Yang, J. et al. (2017). Strong law of large numbers for generalized sample relative entropy of nonhomogeneous Markov chains. Communication in Statistics – Theory and Methods 47(2): 1571–1579.

25.Yari, G.H. & Nikooravesh, Z. (2011). Relative entropy rate between a Markov chain and its corresponding hidden Markov chain. Journal of Statistical Research of Iran 8: 97–109.

26.Zhong, P.P., Yang, W.G., & Liang, P.P. (2010). The asymptotic equipartition property for asymptotic circular Markov chains. Probability in the Engineering and Informational Sciences 24(2): 279–288.