Book contents
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
3 - Kernel-Induced Feature Spaces
Published online by Cambridge University Press: 05 March 2013
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
Summary
The limited computational power of linear learning machines was highlighted in the 1960s by Minsky and Papert. In general, complex real-world applications require more expressive hypothesis spaces than linear functions. Another way of viewing this problem is that frequently the target concept cannot be expressed as a simple linear combination of the given attributes, but in general requires that more abstract features of the data be exploited. Multiple layers of thresholded linear functions were proposed as a solution to this problem, and this approach led to the development of multi-layer neural networks and learning algorithms such as back-propagation for training such systems.
Kernel representations offer an alternative solution by projecting the data into a high dimensional feature space to increase the computational power of the linear learning machines of Chapter 2. The use of linear machines in the dual representation makes it possible to perform this step implicitly. As noted in Chapter 2, the training examples never appear isolated but always in the form of inner products between pairs of examples. The advantage of using the machines in the dual representation derives from the fact that in this representation the number of tunable parameters does not depend on the number of attributes being used. By replacing the inner product with an appropriately chosen ‘kernel’ function, one can implicitly perform a non-linear mapping to a high dimensional feature space without increasing the number of tunable parameters, provided the kernel computes the inner product of the feature vectors corresponding to the two inputs. […]
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2000
- 3
- Cited by