Book contents
- Frontmatter
- Contents
- Contributors
- Preface
- 1 The Modern Mathematics of Deep Learning
- 2 Generalization in Deep Learning
- 3 Expressivity of Deep Neural Networks
- 4 Optimization Landscape of Neural Networks
- 5 Explaining the Decisions of Convolutional and Recurrent Neural Networks
- 6 Stochastic Feedforward Neural Networks: Universal Approximation
- 7 Deep Learning as Sparsity-Enforcing Algorithms
- 8 The Scattering Transform
- 9 Deep Generative Models and Inverse Problems
- 10 Dynamical Systems andOptimal Control Approach to Deep Learning
- 11 Bridging Many-Body Quantum Physics and Deep Learning via Tensor Networks
6 - Stochastic Feedforward Neural Networks: Universal Approximation
Published online by Cambridge University Press: 29 November 2022
- Frontmatter
- Contents
- Contributors
- Preface
- 1 The Modern Mathematics of Deep Learning
- 2 Generalization in Deep Learning
- 3 Expressivity of Deep Neural Networks
- 4 Optimization Landscape of Neural Networks
- 5 Explaining the Decisions of Convolutional and Recurrent Neural Networks
- 6 Stochastic Feedforward Neural Networks: Universal Approximation
- 7 Deep Learning as Sparsity-Enforcing Algorithms
- 8 The Scattering Transform
- 9 Deep Generative Models and Inverse Problems
- 10 Dynamical Systems andOptimal Control Approach to Deep Learning
- 11 Bridging Many-Body Quantum Physics and Deep Learning via Tensor Networks
Summary
We take a look at the universal approximation question for stochastic feedforward neural networks. In contrast with deterministic networks, which represent mappings from inputs to outputs, stochastic networks represent mappings from inputs to probability distributions over outputs. Even if the sets of inputs and outputs are finite, the set of stochastic mappings is continuous. Moreover, the values of the output variables may be correlated, which requires that their values are computed jointly. A prominent class of stochastic feedforward networks are deep belief networks. We discuss the representational power in terms of compositions of Markov kernels expressed by the layers of the network. We investigate different types of shallow and deep architectures, and the minimal number of layers and units that are necessary and sufficient in order for the network to be able to approximate any stochastic mapping arbitrarily well. The discussion builds on notions of probability sharing, focusing on the case of binary variables and sigmoid units. After reviewing existing results, we present a detailed analysis of shallow networks and a unified analysis for a variety of deep networks.
- Type
- Chapter
- Information
- Mathematical Aspects of Deep Learning , pp. 267 - 313Publisher: Cambridge University PressPrint publication year: 2022