Published online by Cambridge University Press: 05 July 2014
Outlier detection is an important subject in machine learning and data analysis. The term outlier refers to abnormal observations that are inconsistent with the bulk of the data distribution [16, 32, 98, 240, 265, 266, 287]. Some sample applications are as follows.
• Detection of imposters or rejection of unauthorized access to computer networks.
• Genomic research – identifying abnormal gene or protein sequences.
• Biomedical, e.g. ECG arrythmia monitoring.
• Environmental safety detection, where outliers indicate abnormality.
• Personal safety, with security aids embedded in mobile devices.
For some real-world application examples, see e.g Hodge and Austin .
The standard approach to outlier detection is density-based, whereby the detection depends on the outlier's relationship with the bulk of the data. Many algorithms use concepts of proximity and/or density estimation in order to find outliers. However, in high-dimensional spaces the data become increasingly sparse and the notion of proximity/density has become less meaningful, and consequently model-based methods have become more appealing [1, 39]. It is also a typical assumption that a model is to be trained from only one type of (say, positive) training patterns, making it a fundamentally different problem, and thereby creating a new learning paradigm leading to one-class-based learning models.
SVM-type learning models are naturally amenable to outlier detection since certain support vectors can be identified as outliers.