Secondary structure prediction with learning methods (nearest neighbors)

Gaston H. Gonnet; Ralf Scholl

doi:10.1017/CBO9780511815027.006

5 - Secondary structure prediction with learning methods (nearest neighbors)

Published online by Cambridge University Press: 17 February 2011

Gaston H. Gonnet and

Ralf Scholl

Show author details

Gaston H. Gonnet: Affiliation:
Eidgenössische Technische Hochschule Zürich

Book contents

Get access

Summary

Topics

Nearest neighbor searching
Clustering
Binary search trees

Learning methods are, in general, methods which adapt their parameters using historical data. Nearest neighbors (NN) is an extreme in this direction, as all training data are stored and the most suitable part is used for predictions. Unless there are contradictory data, every time that NN sees the same information as in the training set, it will return the same answer as in the training set. In this sense, it is a perfect method, as it repeats the training data exactly.

Generalities of nearest neighbor methods (Figure 5.1)

(i) NN methods extract data from close neighbors, which means that a distance function has to be defined between the data points (to determine what is close and what is distant). The data columns may be in very different units, for example kg, years, mm, US$, etc., so the data have to be normalized before computing distances.
(ii) Finding neighbors for small sets of training data (e.g. m < 1000) is best done with a sequential search of all the data. So our interest here is in problems where m is very large and we have to compute neighbors many times.
(iii) As opposed to best basis where extra variables were not obviously harmful (in some cases random columns were even useful), NN deteriorates if we use data columns which are not related to our data.
(iv) NN methods have a close relationship with clustering for which our main algorithm for NN can also be used.

Type: Chapter
Information: Scientific Computation , pp. 66 - 81

DOI: https://doi.org/10.1017/CBO9780511815027.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

5 - Secondary structure prediction with learning methods (nearest neighbors)

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive