Globally convergent stochastic optimization with optimal asymptotic distribution

Jürgen Dippon

doi:10.1239/jap/1032192855

Globally convergent stochastic optimization with optimal asymptotic distribution

Part of: Sequential methods Artificial intelligence (68Txx)

Published online by Cambridge University Press: 14 July 2016

Jürgen Dippon

Show author details

Jürgen Dippon*: Affiliation:
Universität Stuttgart
*: ∗Postal address: Mathematisches Institut A, Universität Stuttgart, 70511 Stuttgart, Germany. E-mail address: dippon@mathematik.uni-stuttgart.de

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A stochastic gradient descent method is combined with a consistent auxiliary estimate to achieve global convergence of the recursion. Using step lengths converging to zero slower than 1/n and averaging the trajectories, yields the optimal convergence rate of 1/√n and the optimal variance of the asymptotic distribution. Possible applications can be found in maximum likelihood estimation, regression analysis, training of artificial neural networks, and stochastic optimization.

Keywords

Stochastic approximation global stochastic optimization averaging gradient descent consistency central limit theorem M-estimator maximum likelihood estimation regression analysis artificial neural networks

MSC classification

Primary: 62L12: Sequential estimation 62L20: Stochastic approximation

Secondary: 68T05: Learning and adaptive systems 68T10: Pattern recognition, speech recognition

Type: Research Papers
Information: Journal of Applied Probability , Volume 35 , Issue 2 , June 1998 , pp. 395 - 406

DOI: https://doi.org/10.1239/jap/1032192855 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Berger, E. (1986). Asymptotic behaviour of a class of stochastic approximation procedures. Prob. Theory Rel. Fields 71, 517–552.Google Scholar

Bishop, C.M. (1996). Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar

Delyon, B., and Juditsky, A. (1992). Stochastic optimization with averaging of trajectories. Stoch. and Stoch. Rep. 39, 107–118.Google Scholar

Dippon, J., and Fabian, V. (1994). Stochastic approximation of global minimum points. J. Statistical Planning and Inference 41, 327–347.Google Scholar

Dippon, J., and Renz, J. (1996). Weighted means of processes in stochastic approximation. Math. Methods Statist. 5, 32–60.Google Scholar

Fabian, V. (1968). On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39, 1327–1332.Google Scholar

Fabian, V. (1978). On asymptotically efficient recursive estimation. Ann. Statist. 6, 854–866.Google Scholar

Fabian, V. (1988). The local asymptotic minimax adaptive property of a recursive estimate. Statistics & Probability Letters 6, 383–388.Google Scholar

Fabian, V. (1992). On neural networks and stochastic approximation. Preliminary Report RM 530. Michigan State University, Dept. of Statistics and Probability.Google Scholar

Fabian, V. (1994). Comment on White (1989). J. Amer. Statist. Assoc. 89, 1571.Google Scholar

Hertz, J., Krogh, A., and Palmer, R.G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA.Google Scholar

Huber, P.J. (1981). Robust Statistics. Wiley, New York.Google Scholar

Kushner, H.J., and Clark, D.S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer, New York.Google Scholar

Kushner, H.J., and Huang, H. (1981). Asymptotic properties of stochastic approximation with constant coefficients. SIAM J. Control Optimization 19, 87–105.Google Scholar

Ljung, L. (1978). Strong convergence of a stochastic approximation algorithm. Ann. Statist. 6, 680–696.CrossRef Google Scholar

Pflug, G. (1984). Stochastic optimization with constant step-size. Asymptotic laws. SIAM J. Control Optimization 24, 655–666.Google Scholar

Polyak, B.T. (1990). New method of stochastic approximation type. Automat. & Remote Control 51, 937–946.Google Scholar

Polyak, B.T., and Juditsky, A.B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optimization 30, 838–855.CrossRef Google Scholar

Polyak, B.T., and Tsybakov, A.B. (1990). Optimal order of accuracy of search algorithms in stochastic optimization. Problems Inform. Transmission 26, 126–133.Google Scholar

Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge.Google Scholar

Robbins, H., and Siegmund, D. (1971). A convergence theorem for non-negative almost supermartingales and some applications. In Optimizing Methods in Statistics, ed. Rustagi, J.S. Academic Press, New York, pp. 233–257.Google Scholar

Ruppert, D. (1988). Efficient estimators from a slowly converging Robbins–Monro process. Technical Report No. 781. School of Oper. Res. and Ind. Eng., Cornell University, Ithaca, New York.Google Scholar

Ruppert, D. (1991). Stochastic approximation. In Handbook of Sequential Analysis, eds. Ghosh, B.K. and Sen, P.K. Marcel Dekker, New York, pp. 503–529.Google Scholar

Sarle, W.S. (1994). Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Group International Conference. SAS Institute, pp. 1538–1550.Google Scholar

Stanković, S.S. and Milosavljević, M.M. (1991). Training of multilayer perceptions by stochastic approximation. In Neural Networks: Concepts, Applications, and Implementations, Vol. IV, eds. Antognetti, P. and Milutinović, V. Prentice Hall, New Jersey, pp. 201–239.Google Scholar

Walk, H. (1992). Foundations of stochastic approximation. In Stochastic Approximation and Optimization of Random Systems, eds. Ljung, L., Pflug, G. and Walk, H. Birkhäuser, Basel, pp. 1–51.Google Scholar

Wei, C.Z. (1987). Multivariate adaptive stochastic approximation. Ann. Statist. 15, 1115–1130.Google Scholar

White, H. (1989). Some asymptotic results for learning in single hidden-layer feedforward network models. J. Amer. Statist. Assoc. 84, 1003–1013. Correction i.b.d. 87, 1252.CrossRef Google Scholar

White, H. et al. (1992). Artificial Neural Networks. Blackwell, Cambridge MA.Google Scholar

Article contents

Globally convergent stochastic optimization with optimal asymptotic distribution

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests