Published online by Cambridge University Press: 31 March 2022
We describe the stochastic gradient method, the fundamental algorithm for several important problems in data science, including deep learning. We give several example problems for which this method is suitable, then described its operation for the simple problem of computing a mean of a collection of values. We related it to a classical method, the Kaczmarz method for solving a system of linear equalities and inequalities. Next, we describe the key assumptions to be used in convergence analysis, then describe the convergence rates attainable by several variants of stochastic gradient under several scenarios. Finally, we discuss several aspects of practical implementation of stochastic gradient, including minibatching and acceleration.