Book contents
- Frontmatter
- Contents
- Preface
- List of abbreviations
- 1 Determination of the accurate location of an aircraft
- 2 When to replace equipment
- 3 Secondary structure prediction using least squares and singular value decomposition
- 4 Secondary structure prediction using least squares and best basis
- 5 Secondary structure prediction with learning methods (nearest neighbors)
- 6 Secondary structure prediction with linear programming (LP)
- 7 Stock market prediction
- 8 Phylogenetic tree construction
- Appendix A Methods for function minimization
- Appendix B Online resources
- Index
5 - Secondary structure prediction with learning methods (nearest neighbors)
Published online by Cambridge University Press: 17 February 2011
- Frontmatter
- Contents
- Preface
- List of abbreviations
- 1 Determination of the accurate location of an aircraft
- 2 When to replace equipment
- 3 Secondary structure prediction using least squares and singular value decomposition
- 4 Secondary structure prediction using least squares and best basis
- 5 Secondary structure prediction with learning methods (nearest neighbors)
- 6 Secondary structure prediction with linear programming (LP)
- 7 Stock market prediction
- 8 Phylogenetic tree construction
- Appendix A Methods for function minimization
- Appendix B Online resources
- Index
Summary
Topics
Nearest neighbor searching
Clustering
Binary search trees
Learning methods are, in general, methods which adapt their parameters using historical data. Nearest neighbors (NN) is an extreme in this direction, as all training data are stored and the most suitable part is used for predictions. Unless there are contradictory data, every time that NN sees the same information as in the training set, it will return the same answer as in the training set. In this sense, it is a perfect method, as it repeats the training data exactly.
Generalities of nearest neighbor methods (Figure 5.1)
(i) NN methods extract data from close neighbors, which means that a distance function has to be defined between the data points (to determine what is close and what is distant). The data columns may be in very different units, for example kg, years, mm, US$, etc., so the data have to be normalized before computing distances.
(ii) Finding neighbors for small sets of training data (e.g. m < 1000) is best done with a sequential search of all the data. So our interest here is in problems where m is very large and we have to compute neighbors many times.
(iii) As opposed to best basis where extra variables were not obviously harmful (in some cases random columns were even useful), NN deteriorates if we use data columns which are not related to our data.
(iv) NN methods have a close relationship with clustering for which our main algorithm for NN can also be used.
- Type
- Chapter
- Information
- Scientific Computation , pp. 66 - 81Publisher: Cambridge University PressPrint publication year: 2009