Published online by Cambridge University Press: 31 March 2022
This chapter describes methods based on gradient information that achieve faster rates than basic algorithms such as those described in Chapter 3. These accelerated gradient methods, most notably the heavy-ball method and Nesterov’s optimal method, use the concept of momentum which means that each step combines information from recent gradient values but also earlier steps. These methods are described and analyzed using an analysis based on Lyapunov functions. The cases of convex and strongly convex functions are analyzed separately. We motivate these methods using continuous-time limits, which link gradient methods to dynamical systems described by differential equations. We mention also the conjugate gradient method, which was developed separately from the other method but which also makes use of momentum. Finally, we discuss the concept of lower bounds on algorithmic complexity, introducing a function on which no method based on gradients can attain convergence faster than a certain given rate.