Book contents
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
4 - Generalisation Theory
Published online by Cambridge University Press: 05 March 2013
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
Summary
The introduction of kernels greatly increases the expressive power of the learning machines while retaining the underlying linearity that will ensure that learning remains tractable. The increased flexibility, however, increases the risk of overfitting as the choice of separating hyperplane becomes increasingly ill-posed due to the number of degrees of freedom.
In Chapter 1 we made several references to the reliability of the statistical inferences inherent in the learning methodology. Successfully controlling the increased flexibility of kernel-induced feature spaces requires a sophisticated theory of generalisation, which is able to precisely describe which factors have to be controlled in the learning machine in order to guarantee good generalisation. Several learning theories exist that can be applied to this problem. The theory of Vapnik and Chervonenkis (VC) is the most appropriate to describe SVMs, and historically it has motivated them, but it is also possible to give a Bayesian interpretation, among others.
In this chapter we review the main results of VC theory that place reliable bounds on the generalisation of linear classifiers and hence indicate how to control the complexity of linear functions in kernel spaces. Also, we briefly review results from Bayesian statistics and compression schemes that can also be used to describe such systems and to suggest which parameters to control in order to improve generalisation.
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2000