Hostname: page-component-84b7d79bbc-lrf7s Total loading time: 0 Render date: 2024-07-26T09:10:42.307Z Has data issue: false hasContentIssue false

Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-Type Inequalities for Markov Chains and Related Processes

Published online by Cambridge University Press:  30 January 2018

Aryeh Kontorovich*
Affiliation:
Ben-Gurion University of the Negev
Roi Weiss*
Affiliation:
Ben-Gurion University of the Negev
*
Postal address: Department of Computer Science, Ben-Gurion University, Beer Sheva, 84105, Israel.
Postal address: Department of Computer Science, Ben-Gurion University, Beer Sheva, 84105, Israel.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We observe that the technique of Markov contraction can be used to establish measure concentration for a broad class of noncontracting chains. In particular, geometric ergodicity provides a simple and versatile framework. This leads to a short, elementary proof of a general concentration inequality for Markov and hidden Markov chains, which supersedes some of the known results and easily extends to other processes such as Markov trees. As applications, we provide a Dvoretzky-Kiefer-Wolfowitz-type inequality and a uniform Chernoff bound. All of our bounds are dimension-free and hold for countably infinite state spaces.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Prob. 13, 10001034.Google Scholar
Adamczak, R. and Bednorz, W. (2012). Exponential concentration inequalities for additive functionals of Markov chains. Preprint. Available at http://arxiv.org/abs/1201.3569v1.Google Scholar
Anandkumar, A., Hsu, D. and Kakade, S. M. (2012). A method of moments for mixture models and hidden Markov models. In Proc. 25th Annual Conf. Learning Theory (Edinburgh, June 2012), 34 pp.Google Scholar
Berend, D. and Kontorovich, A. (2013). A sharp estimate of the binomial mean absolute deviation with applications. Statist. Prob. Lett. 83, 12541259.Google Scholar
Bobkov, S. G. and Götze, F. (2010). Concentration of empirical distribution functions with applications to non-i.i.d. models. Bernoulli 16, 13851414.Google Scholar
Brémaud, P. (1999). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York.Google Scholar
Chazottes, J.-R. and Redig, F. (2009). Concentration inequalities for Markov processes via coupling. Electron. J. Prob. 14, 11621180.Google Scholar
Chazottes, J.-R., Collet, P., Külske, C. and Redig, F. (2007). Concentration inequalities for random fields via coupling. Prob. Theory Relat. Fields 137, 201225.Google Scholar
Chung, K.-M., Lam, H., Liu, Z. and Mitzenmacher, M. (2012). Chernoff–Hoeffding bounds for Markov chains: generalized and simplified. In 29th Internat. Symp. Theoret. Aspects Comput. Sci., Schloss Dagstuhl, Wadern, pp. 124135.Google Scholar
Diaconis, P. and Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Prob. 6, 695750.Google Scholar
Diaconis, P. and Saloff-Coste, L. (1996). Nash inequalities for finite Markov chains. J. Theoret. Prob. 9, 459510.Google Scholar
Dinwoodie, I. H. (1995). A probability inequality for the occupation measure of a reversible Markov chain. Ann. Appl. Prob. 5, 3743.Google Scholar
Dinwoodie, I. H. (1998). Expectations for nonreversible Markov chains. J. Math. Anal. Appl. 220, 585596.CrossRefGoogle Scholar
Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist. 27, 642669.Google Scholar
Fill, J. A. (1991). Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann. Appl. Prob. 1, 6287.Google Scholar
Gillman, D. (1998). A Chernoff bound for random walks on expander graphs. SIAM J. Comput. 27, 12031220.Google Scholar
Hsu, D., Kakade, S. M. and Zhang, T. (2009). A spectral algorithm for learning hidden Markov models. In Proc. 22nd Annual Conf. Learning Theory (Montreal, June 2009), 10 pp.Google Scholar
Kahale, N. (1997). Large deviation bounds for Markov chains. Combin. Prob. Comput. 6, 465474.Google Scholar
Kontorovich, A. (2012). Obtaining measure concentration from Markov contraction. Markov Process. Relat. Fields 18, 613638.Google Scholar
Kontorovich, A., Nadler, B. and Weiss, R. (2013). On learning parametric-output HMMS. In Proc. 30th Internat. Conf. Machine Learning (June 2013, Atlanta), pp. 702710.Google Scholar
Kontorovich, L. and Ramanan, K. (2008). {Concentration inequalities for dependent random variables via the martingale method}. Ann. Prob. 36, 21262158.Google Scholar
Kontoyiannis, I. and Meyn, S. P. (2012). Geometric ergodicity and the spectral gap of non-reversible Markov chains. Prob. Theory Relat. Fields 154, 327339.Google Scholar
León, C. A. and Perron, F. (2004). Optimal Hoeffding bounds for discrete reversible Markov chains. Ann. Appl. Prob. 14, 958970.Google Scholar
Lezaud, P. (1998). Chernoff-type bound for finite Markov chains. Ann. Appl. Prob. 8, 849867.Google Scholar
Markov, A. A. (1906). Extension of the law of large numbers to dependent quantities. Izvestiia Fiz.-Matem. Obsch. Kazan Univ. 15, 135156.Google Scholar
Marton, K. (1996). Bounding {d}-distance by informational divergence: a method to prove measure concentration. Ann. Prob. 24, 857866.CrossRefGoogle Scholar
Marton, K. (1998). Measure concentration for a class of random processes. Prob. Theory Relat. Fields 110, 427439.Google Scholar
Marton, K. (2003). Measure concentration and strong mixing. Studia Sci. Math. Hungarica 40, 95113.Google Scholar
Marton, K. (2004). Measure concentration for Euclidean distance in the case of dependent random variables. Ann. Prob. 32, 25262544.CrossRefGoogle Scholar
Massart, P. (1990). The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Prob. 18, 12691283.Google Scholar
Mossel, E. and Roch, S. (2006). Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Prob. 16, 583614.Google Scholar
Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. I Math. 330, 905908.CrossRefGoogle Scholar
Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and Φ-mixing processes. Ann. Prob. 28, 416461.Google Scholar
Siddiqi, S., Boots, B. and Gordon, G. (2010). Reduced-rank hidden Markov models. In Proc. 13th Internat Conf. Artificial Intelligence Statist. (Sardinia, Italy, May 2010), pp. 741748.Google Scholar
Wagner, R. (2008). Tail estimates for sums of variables sampled by a random walk. Combin. Prob. Comput. 17, 307316.Google Scholar
Zou, J. Y., Hsu, D., Parkes, D. and Adams, R. P. (2013). Contrastive learning using spectral methods. In Advances in Neural Information Processing Systems 26, eds Burges, C. J. C. et al., pp. 22382246.Google Scholar