Published online by Cambridge University Press: 31 March 2022
First derivatives (gradients) are needed for most of the algorithms described in the book. Here, we describe how these gradients can be computed efficiently for functions that have the form of arising in deep learning. The reverse mode of automatic differentiation, often called “back-propagation” in the machine learning community, is described for several problems with nested-composite and progressive structure that arises in neural network training. We provide another perspective on these techniques, based on a constrained optimization formulation and optimality conditions for this formulation.