Proceedings of ACM SIGMOD, pages 37–46, 2001. and . Outlier detection for high dimensional data. In
 On least squares and linear combinations of observations. Proc. Royal Soc. Edinburgh, 55:42–18, 1935..
 A new look at the statistical model identification. IEEE Trans. Automatic Control, 19(6):716–723, 1974..
 et al.Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000., ,
 et al.Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA, 96(12):6745, 1999., ,
 Int. J. Neural Systems, 12;117–135, 2002.. Kernel-Kohonen networks.
 et al.MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41–47, 2002., ,
 Theory of reproducing kernels. Trans. Am. Math. Soc., 68:337–404, 1950..
 Bioinformatics: The Machine Learning Approach, 2nd edition. Cambridge, MA: MIT Press, 2001. and .
 Generalized discriminant analysis using a kernel approach. Neural Computation, 12:2385–2404, 2000. and .
 et al.Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med., 8:816–824, 2002., .
 Dynamic Programming.Princeton, NJ: Princeton University Press, 1957..
 Adaptive Control Processes: A Guided Tour.Princeton, NJ: Princeton University Press, 1961..
 Learning distributions by their density levels: A paradigm for learning without a teacher. J. Computer System Sci., 55:171–182, 1997. and .
 et al.Tissue classification with gene expression profiles. J. Computai. Biol., 7:559–583, 2000., ,
 Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press., , , and . A support vector method for hier¬archical clustering. In , , and , editors,
 Nonlinear Programming.Belmont, MA: Athena Scientific, 1995..
 Proceedings, International Conference on Intelligent Sensing and Information Processing, pages 433–438, 2004.. Robust classification of noisy data using second order cone program¬ming approach. In
 Prediction and Improved Estimation in Linear Models. New York: Wiley, 1977. and .
 Neural Networks for Pattern Recognition.Oxford: Oxford University Press, 1995..
 Training with noise is equivalent to Tikhonov regularization. Neural Comput., 7:108–116, 1995..
 Pattern Recognition and Machine Learning.Berlin: Springer, 2006..
 Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT '95).New York: ACM Press, pages 265–272, 1995., , , and . Online learning via congregational gradient descent. In
 Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992., , and . A training algorithm for optimal margin classifiers. In , editor,
 Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7-9):1257–1273, 2008., , , and .
 Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004., , , and .
 International Conference on Machine Learning, pages 82–90, 1998. and . Feature selection via concave minimization and support vector machines. In
 Is cross-validation valid for small-sample microarray classification?Bioinformatics, 20(3):378–380, 2004. and .
 et al.Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. USA, 97(1):262–267, 2000., ,
 A tutorial on support vector machines for pattern recognition. Knowledge Discovery Data Mining, 2(2):121–167, 1998..
 A novel kernel method for clustering. IEEE Trans. Pattern Anal. Machine Intell., 27(5):801–804, 2005. and .
 Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, 2001. and . A linear programming approach to novelty detection. In
 Proceedings, The 1st International Workshop on Machine Learning for Vision-based Motion Analysis - MLVMA08, 2008., , and . Approximate RBF kernel SVM and its applications in pedestrian classification. In
[33b] et al.Capturing cognitive fingerprints from keystroke dynamics. IT Professional, 15(4):24–28, 2013., ,
 LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Technol., 2(27):1–27, 2011. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm. and .
 Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks, 10:1055–1064, 1999., , and .
 Choosing kernel parameters for support vector machines. In Machine Learning, 46:131–159, 2002., , , and .
 A tutorial on v-support vector machines, 2003 (http://www.kernel-machines.org)., , and .
 Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), volume 8, pages 93–103, 2000. and . Biclustering of expression data. In
 Learning from Data, Concepts, Theory and Methods. New York: John Wiley, 1998. and .
 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 42–53, 2001. and . Knowledge discovery in multi-label phenotype data. In
 Support vector networks. Machine Learning, 20:273–297, 1995. and .
 Methods of Mathematical Physics.New York: Interscience, 1953. and .
 Methods of Mathematical Physics, volumes I and II. New York: Wiley Interscience, 1970. and .
 Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Computers, 14:326–334, 1965..
[44b] Cox. Multidimensional Scaling.London: Chapman and Hall, 1994. and .
 Data set provider. http://www.igi.tugraz.at/aschwaig.
 Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng., 51(7):1196–1206, 2004., , and
 Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc., Ser. B, 39(1):1–38, 1977., , and .
 Proceedings of the 10th ACM KDD Conference, Seattle, WA, 2004., , and . Kernel K-means, spectral clustering and normalized cuts. In
 Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP-2012), Kyoto, pages 2057–2060, 2012. and . Binary classification by minimizing the mean squared slack. In
 Principal Component Neural Networks. New York: Wiley, 1996. and .
 Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., 2:263–286, 1995. and .
 Advances in Neural Information Processing Systems (NIPS '96), Volume 9. Cambridge, MA: MIT Press, pages 155–161, 1997., , , , and . Support vector regression machines. In
 Pattern Classification and Scene Analysis.New York: Wiley 1973. and .
 Pattern Classification, 2nd edition. New York: Wiley, 2011., , and .
 Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576, Department of Statistics, University of California, Berkeley, CA, 2000., , and .
 Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Statist. Assoc., 97:77–88, 2002., , and .
 An analogue approach to the travelling salesman problem using an elastic net method. Nature, 326:689–691, 1987. and .
 Bootstrap methods: Another look at the jackknife. Ann. Statist., 7:1–26, 1979..
 The Jackknife, the Bootstrap and Other Resampling Plans.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1982..
 Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Statist. Assoc., 78:316–331, 1983..
 Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. USA, 95:14863–14868, 1998., , , and .
 The kernel recursive least-squares algorithm. IEEE Trans. Signal Processing, 52(8):2275–2285, 2004., , and .
 Advances in Kernel Methods -Support Vector Learning.Cambridge, MA: MIT Press, 1999., , and (editors).
 Theoretical foundation of the potential function method in pattern recognition learning. Automation Remote Control, 25:821–837, 1964., , and .
 Advances in Neural Information Processing Systems 11.Cambridge, MA: MIT Press, pages 438–444, 1999., , , and . Classification on pairwise proximity data.
 et al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999., ,
 A survey of kernel and spectral methods for clustering. Pattern Recognition, 41:176–190, 2008., , , and .
 The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7:179–188, 1936..
 Practical Methods of Optimization, 2nd edition. New York: Wiley, 1987..
 Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21:768–769, 1965..
 A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7(1):126, 2006. and .
 Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999. and .
 Introduction to Statistical Pattern Recognition, 2nd edition. Amsterdam: Elsevier, 1990..
[73b] Proceedings, ACMKDD01, San Francisco, 2001. and . Proximal support vector machine classifiers. In
 et al.Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10):906–914, 2000., ,
 Scoring clustering solutions by their biological relevance. Bioinformatics, 19(18):2381–2389, 2003., , and .
 et al.Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Comput., 14(5):1115–1147, 2002., ,
 Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res., 12:1574–1581, 2002. and .
 Mercer kernel based clustering in feature space. IEEE Trans. Neural Networks, 13(3):780–784, 2002..
 Matrix Computations, 3rd edition. Battimore, MD: Johns Hopkins University Press, 1996. and .
 Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Industrial Appl. Math.: Ser. B, Numerical Anal., 2(2):205–224, 1965. and .
 An analysis of the total least squares problem. SIAM J. Numerical Anal., 17:883–893, 1980. and .
 et al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999., ,
 Adaptive Filtering: Prediction and Control.Englewood Cliffs, NJ: Prentice Hall, 1984. and .
 Note on free lunches and cross-validation. Neural Comput., 9:1211–1215, 1997..
 Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems, pages 90–97, 1998. and . Fuzzy topographic kernel clustering. In , editor,
 Errors-in-the-variables bias in nonlinear contexts. Econometrica, 38(2):368–370, 1970. and .
 Support vector machines for classification and regression. USC-ISIS Technical ISIS Technical Report, 1998..
 Proceedings, 2006 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '06), pages 416–422, 2006., , and . Eukaryotic protein subcellular localization based on local pairwise profile alignment SVM.
 Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002., , , and .
 Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, Princeton, NJ, 1991., , and . (Editors).
 Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefeld, 2007., , , and . Topographic processing of relational data. In
 Direct clustering of a data matrix. J. Am. Statist. Assoc., 67(337):123–129, 1972..
 et al.“Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol., 1(2):research0003.1-research0003.21, 2000., ,
 Adaptive Filter Theory, 3rd edition. Englewood Cliffs, NJ: Prentice Hall, 1996..
 Neural Networks: A Comprehensive Foundation, 2nd edition. Englewood Cliffs, NJ: Prentice Hall, 2004..
 Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press, 2005., , and . Laplacian score for feature selection. In
 Trends and controversies -support vector machines. IEEE Intelligent Systems, 13:18–28, 1998., , , , and .
 A survey of outlier detection methodologies. Intell. Rev., 22:85–126, 2004. and .
 Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1):80–86, 1970. and .
[99b] Kernel methods in machine learning. Ann. Statist., 36(3):1171–1220, 2008., , and .
 Analysis of a complex of statistical variables into principal components. J. Educational Psychol., 24:498–520, 1933..
 A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 13(2):415–425, 2002. and .
 Advances in Neural Information Processing Systems 22, Cambridge, MA: MIT Press, pages 772–780, 2009., , , and . Multi-label prediction via compressed sensing). In
 A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. J. Molec. Biol., 308(2):397–W7, 2001. and .
 Prediction of protein subcellular locations using fuzzy K-NN method. Bioinformatics, 20(1):21–28, 2004. and .
 Robust statistics: A review. Ann. Math. Statist., 43:1041–1067, 1972..
 Robust Statistics.New York: John Wiley and Sons, 1981..
 et al.Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Lancet, 361(9361):923–929, 2003., ,
 LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, Volume 3, pages 1497–1500, 2004. and .
 Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mechanical Systems Signal Processing, 16:373–390, 2002. and .
 An assessment of recently published gene expression data analyses: Reporting experimental design and statistical factors. BMC Med. Inform., 6:27, 2006. and .
 Data clustering: A review. ACM Comput. Surveys, 31(3):264–323, 1999., , and .
 Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, Volume 1. Berkeley, CA: University of California Press, pages 361–380, 1960. and . Estimation with quadratic loss. In
 Proceedings of European Conference on Machine Learning, Berlin: Springer, pages 137–142, 1997.. Text categorization with support vector machines: Learning with many relevant features.
 Advances in Kernel Methods - Support Vector Learning.Cambridge, MA: MIT Press, 1999.. Making large-scale SVM learning practical. In , , and , editors,
 Principal Component Analysis, 2nd edition. New York: Springer, 2002..
 Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, volume 3, pages 2017–2022, 2004. and . Robust outlier detection using SVM regression. In
 An Introduction to Probabilistic Graphical Models.Cambridge, MA: MIT Press, 2002. and .
 IEEE Workshops on Neural Networks for Signal Processing.New York: IEEE Press, 1991., , and .
 Linear Systems.Englewood Cliffs, NJ: Prentice Hall, 1980..
 Linear Esitmation.Englewood Cliffs, NJ: Prentice Hall, 2000., , and .
 Snakes: Active contour models. Int. J. Computer Vision, 1:321–331, 1987., , and .
 Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13:637–649, 2001., , , and .
 et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6):673–679, 2001., ,
 et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7:673–679, 2001., ,
 Texture classification with kernel principal component analysis. Electron. Lett., 36(12):1021–1022, 2000., , , and .
 Some results on Tchebycheffian spline functions. J. Math. Anal. Applications, 33:82–95, 1971. and .
 Optimization by simulated annealing. Science, 220:671–680, 1983., , and .
 Wrappers for feature selection. Artif. Intell., 97(1-2):273–324, 1997. and .
 Self-organized formation of topologically correct feature map. Biol. Cybernet., 43:59–69, 1982..
 Self-Organization and Associative Memory.New York: Springer, 1984..
 Self-Organizing Maps, 2nd edition. Berlin: Springer, 1997..
 Proceedings, North American Chapter of the Association for Computational Linguistics, 2001. and . Chunking with support vector machines. In
 Why systolic architectures?IEEE Computer, 15(1):37–46, 1982..
 VLSI Array Processors.Englewood Cliffs, NJ: Prentice Hall, 1988..
 Digital Neural Networks.Englewood Cliffs, NJ: Prentice Hall, 1993..
 Proceedings of PCM 2009, Bangkok, pages 1–32. Berlin: Springer-Verlag, 2009.. Kernel approaches to unsupervised and supervised machine learning. In
 Adaptive principal component extraction (APEX) and applications. IEEE Trans. Signal Processing, 42(5):1202–1217, 1994., , and .
 Neural Networks for Signal Processing II.Piscataway, NJ: IEEE, 1992., , , and (Editors).
 PDA-SVM hybrid: A unified model for kernel-based supervised classification. J. Signal Processing Systems, 65(1):5–21, 2011. and .
 Feature selection for self-supervised classification with applications to microarray and sequence data. IEEE J. Selected Topics Signal Processing: Special Issue Genomic and Proteomic Signal Processing, 2(3):297–309, 2008. and .
 Biometric Authentication: A Machine Learning Approach.Upper Saddle River, NJ: Prentice Hall, 2005., , and .
 IEEE Computational Systems Bioinformatics Conference, Stanford, CA, 2005., , and . Multi-metric and multi-substructure biclus- tering analysis for gene expression data. In
 Symmetric and asymmetric multi- modality biclustering analysis for microarray data matrix. J. Bioinformatics Comput. Biol., 4(3):275–298, 2006., , and .
 Proceedings, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12), Kyoto, 2012. and . On efficient learning and classification kernel methods. In
 Proceedings, 2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '12), 2012. and . Perturbation regulated kernel regressors for supervised machine learning. In
 Recursive kernel trickfor networksegmentation. Int. J. Robust Nonlinear Control, 21(15):1807–1822, 2011. and .
 Proceedings of the 18th International Conference on Machine Learning, San Francisco. New York: Morgan Kaufman, 2001. and . Estimating a kernel Fisher discriminant in the presence of label noise. In
 Plaid models for gene expression data. Technical report, 03, 2000 (www-stat.stanford.edu/owen/reports/plaid.pdf). and .
 Proceedings of ICASSP, pages 1597–1600, 2011., , and . Improving kernel-energy trade-offs for machine learning in implantable and wearable biomedical applications. In
 Journal of Signal Processing Systems, Berlin: Springer, published online, 2012., , and . Low-energy formulations of support vector machine kernel functions for biomedical sensor applications.
 Principles of Signal Detection and Parameter Estimation.Berlin: Springer, 2008..
 Performance Evaluation of Maximum Log-Likelihood Classification, ELE 571 Course Project Report, Princeton University, Princeton, NJ, 2010..
 Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition. Protein Peptide Lett., 19:375–387, 2012., , , , and .
 Toward intelligent music information retrieval. IEEE Trans. Multimedia, 8(3):564–574, 2006. and .
 et al.Transcriptional network analysis identifies BACH1 as a master regulator of breast cancer bone metastasis. J. Biol. Chem., 287(40):33533–33544, 2012., ,
 Opensource EEG libraries and toolkit. http://www.goomedic.com/opensource-eeg-libraries-and-toolkits-for-developers.html.
 Kernel Adaptive Filtering: A Comprehensive Introduction.New York: Wiley, 2010., , and .
 System Identification: Theory for the User.Englewood Cliffs, NJ: Prentice Hall, 1999..
 Proceedings, 6th International Parallel Processing Symposium, Los Alamitos, CA, pages 247–249, 1991., , and . Analysis of neighborhood interaction in Kohonen neural networks. In
 Analysis of the convergence properties of topology preserving neural networks. IEEE Trans. Neural Networks, 4:207–220, 1993., , and .
 Searching for hypothetical proteins: Theory and practice based upon original data and literature. Prog. Neurobiol., 77:90–127, 2005., , , and .
 Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 21(24):4356–4362, 2005. and .
 Proceedings of 4th International Conference on Knowledge-Based Intelligence Engineering Systems and Applied Technologies, 2000. and . The kernel self organising map. In
 Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Proba¬bilities ( and , editors, volume 1, pages 281–297. Berkeley, CA: University of California Press, 1967.. Some methods for classification and analysis of multivariate observation. In
 On the generalised distance in statistics. J. Proc. Asiatic Soc. Bengal, 2:49–55, 1936..
 PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 5(3):416–422, 2008., , and .
 International Conference on Neural Information Processing, pages 314–323, 2006. and . A solution to the curse of dimensionality problem in pairwise scoring techniques. In
 Fusion of feature selection methods for pairwise scoring SVM. Neurocomputing, 71(16-18):3104–3113, 2008. and .
 Proceedings of ICASSP, Kyoto, 2012. and . Low-power SVM classifiers for sound event classification on mobile devices. In
 Learning Theory and Kernel Machines.Berlin: Springer-Verlag, pages 102–113, 2003., , and . Knowledge-based nonlinear kernel clas- sifers. In
 On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Statist., 18:50–60, 1947. and .
 Mathworks-SVM. Mathworks bioinformatics toolbox.
 A geometric approach to support vector machine (SVM) classification. IEEE Trans. Neural Networks, 17(3):671–682, 2006. and .
 Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley & Sons, 1992..
 Functions of positive and negative type, and their connection with the theory of integral equations. Trans. London Phil. Soc., A209:415–446, 1909..
 Kernel Fisher Discriminants. PhD thesis, The Technical University of Berlin, Berlin, 2002..
 Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, pages 591–597, 2001., , and . A mathematical programming approach to the kernel Fisher algorithm. In
 Neural Networks for Signal Processing IX, pages 41–48, 1999., , , , and . Fisher discriminant analysis with kernels. In , , , and , editors,
 Proceedings AISTATS, San Francisco, CA, pages 98–104. New York: Morgan Kaufmann, 2001., , and . An improved training algorithm for kernel Fisher discriminants. In and , editors,
 Mathematical Classification and Clustering.Berlin: Springer, 1996..
 Machine Learning.New York: McGraw-Hill, 1997..
 A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Technical Report, HP Laboratories Cambridge, 2004., , and .
 MSPsim. http://www.sics.se/project/mspsim.
 Proceedings, IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, FL, pages 276–285, 1997., , and . Nonlinear prediction of chaotic time series using support vector machines. In , , , and , editors,
 An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001., , , , and .
 et al.Predicting time series with support vector machines. In Proceedings, International Conference on Artificial Neural Networks, London: Springer-Verlag, pages 999–1004, 1997., ,
 An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001., , , , and .
 Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT press, pages 599–605, 1997., , , and . Adaptive on-line learning in changing environments. In , , and , editors,
 Context-sensitive methods for learning from genomic data. Thesis, Department of Electrical Engineering, Princeton University, Princeton, NJ, 2007..
 Bioinfomotics. Published online, Oxford University Press, 2005., , , and . Accurate detection of aneu-ploidies in array cgh and gene expression microarray data. In
 On estimating regression. Theory Probability Applicationss, 9:141–142, 1964..
 Neural network frequently asked questions. http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html.
 On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20:175–240, 1928. and .
 Laplacian linear discriminant analysis approach to unsupervised feature selection. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 6(4):605–614, 2009. and .
 et al.Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res., 63(7):1602–1607, 2003., ,
 A simplified neuron model as a principal component analyzer. J. Math. Biol., 15:267–273, 1982..
 An improved training algorithm for support vector machines. In , , , and , Editors, Proceedings, IEEE Workshop on Neural Networks for Signal Processing VII, Amelia Island, FL, pages 276–285, 1997., , and .
 Cambridge, MA, 1985.. Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, MIT,
 On estimation of a probability density function and mode. Ann. Math. Statist., 33:1065–1076, 1962..
 International Conference on Computational Biology, Pittsburgh, PA, pages 249–255, 2001., , , and . Gene functional classification from heterogeneous data. In
 On lines and planes of closest fit to systems of points in space. Phil. Mag. Ser. 6, 2:559–572, 1901..
 The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford: Oxford University Press, 2003..
 PhysioNet. http://www.physionet.org.
 Advances in Kernel Methods - Support Vector Learning, Cambridge, MA: MIT Press, pages 185–208, 1999.. Fast training of support vector machines using sequential minimal opti¬mization. In , , and , editors,
 Advances in Neural Information Processing Systems 10, 1998.. Using analytic QP and sparseness to speed training of support vector machines. In
 Systematic benchmarking of microarray data classification: Assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20(17):3185–3195, 2004., , , and .
 Networks for approximation and learning. Proc. IEEE, 78(9):1481–1497, 1990. and .
 et al.Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–442, 2002., ,
 Support vector machines for 3D object recognition. IEEE Trans. Pattern Analysis Machine Intell., 20:637–646, 1998. and .
 An Introductionn to Signal Dection and Estimation, 2nd edition, Berlin: Springer, 1994..
 Microwave Engineering, 3rd edition. New York: Wiley, 2005..
 Proteomic cancer classification with mass spectrometry data. Am. J. Pharmacogenomics, 5(5):281–292, 2005., , and .
 et al.Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, 98(26):15149–15154, 2001., ,
 Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 254–269, 2009., , , and . Classifier chains for multi-label classification. In
 GeneCluster 2.0: An advanced toolset for bioarray analysis. Bioinformatics, 20(11):1797–1798, 2004., , , , and
 Using neural networks for prediction of the subcellular location of proteins. Nucl. Acids Res., 26:2230–2236, 1998. and .
 Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press, 1996..
 A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, 1983..
 Neural Computation and Self-Organizing Maps: An Introduction.Reading, MA: Addison-Wesley, 1992., , and .
 The perceptron: A probabilistic model for information storage and organization of the brain. Psychol. Rev., 65:42–99, 1958..
 Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 27:832–837, 1956..
 Nonpara-metric Techniques in Statistical Inference.London: Cambridge University Press, pages 199–213, 1970.. Density estimates and Markov sequences. In , editor,
 Advances in Neural Information Processing Systems 15.Cambridge, MA: MIT Press, pages 817–824, 2003., , , and . Going metric: Denoising pairwise data. In
 Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000. and . Nonlinear discriminant analysis using kernel functions. In , , and , editors,
 Proceedings, 2013 IEEE International Workshop on Machine Learning For Signal Processing, Southampton, 2013., , and . Parameter design tradeoff between prediction performance and training time for ridge-SVM. In
 Parallel Distribution Processing: Explorations in the Microstruture of Cognition, Volume 1: Foundation.Cambridge, MA: MIT Press/Bradford Books, 1986., , and . Learning internal representations by error propagation. In , , and the PDP Research Group, editors,
 Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 12:459–173, 1989..
 Fundamentals of Adaptive Filtering.New York: Wiley, 2003..
 A state space approach to adaptive RLS filtering. IEEE Signal Processing Mag., 11:18–60, 1994. and .
 Fundamentals of Adaptive Filtering.John Wiley, 2003 (see page 30)..
 Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000. and .
 Statistical Signal Processing.Reading, MA: Addison-Wesley, 1991..
 Nonparametric regression in the presence of measurement error. Econometric Theory, 20(6):1046–1093, 2004..
 2000.. Statistical learning and kernel methods. Technical Report MSR-TR 200023, Microsoft Research,
 Proceedings, International Conference on Artificial Neural Networks, 1996., , and . Incorporating invariances in support vector learning machines. In
 2000., , , and . A generalized representer theorem. NeuroCOLT2 Technical Report Series, NC2-TR-2000-82,
 Estimating the support of a high-dimensional distribution. Neural Computation, 13:1443–1472, 2001., , , , and .
 Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998., , and K.- .
[238b] New support vector algorithms. Neural Comput., 12:1207–1245, 2000., , , and .
 Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.Cambridge, MA: MIT Press, 2002. and .
 Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000., , , , and . Support vector method for novelty detection. In , , and , editors,
 . SVM toolbox for MATLAB.
 Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978..
 Support vector machine techniques for nonlinear equalization. IEEE Trans. Signal Processing, 48(11):3217–3226, 2000. and .
 Support Vector Machines and Other Kernel-Based Learning Methods.Cambridge: Cambridge University Press, 2004. and .
 Virus-mPLoc: A fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn., 26:175–186, 2010. and .
 Prior knowledge in support vector kernels. Advances in Neural Information Processing Systems 10.640–646, 1998., ., , and .
 et al.Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002., ,
 Online classification using kernels and projection-based adaptive algorithms. IEEE Trans. Signal Processing, 56(7):2781–2797, 2008., , and .
 Comparison of biosequences. Adv. Appl. Math., 2:482–489, 1981. and .
 The connection between regularization operators and support vector kernels. Neural Networks, 11:637–649, 1998., , and .
 Advances in Large Margin Classifiers.Cambridge, MA: MIT Press, 2000., , , and .
 Numerical taxonomy: The Principles and Practice of Numerical Classification.San Francisco, CA: W. H. Freeman, 1973. and .
 et al.Prediction of protein retention times in anion- exchange chromatography systems using support vector regression. J. Chem. Information Computer Sci., 42:1347–1357, 2002., ,
 Support vector machine based arrhythmia classification using reduced features. Int. J. Control, Automation, Systems, 3:571–579, 2005., , , , and .
 et al.Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14):8418–8423, 2003., ,
 Probabilistic neural networks. Neural Networks, 3:109–118, 1990..
 et al.Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12):3273–3297, 1998., ,
 Implementational issues of support vector machines. Technical Report CSD-TR-96-18, Computational Intelligence Group, Royal Holloway, University of London, 1996. and .
 Introduction to Linear Algebra.Wellesley, MA: Wellesley Cambridge Press, 2003..
 Least squares support vector machine classifiers. Neural Processing Lett., 9(3):293–300, 1999. and .
 SVMlight. http://svmlight.joachims.org/.
 Proceedings, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE '05).Minneapolis, MN, pages 89–96, 2005., , and . Multi-class biclustering and classification based on modeling of gene regulatory networks. In
 et al.Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96:2907–2912, 1999., ,
 Systematic determination of genetic network architecture. Nature Genetics, 22:281–285, 1999., , , , and .
 Proceedings of the European Symposium on Artificial Neural Networks, ESANN '99, Brussels, pages 251–256, 1999. and . Duin. Data domain description using support vectors. In (Editor),
 Duin. Support vector domain description. Pattern Recognition Lett., 20:1191–1199, 1999. and .
 Pattern Recognition, 4th edition. New York: Academic Press, 2008. and .
 Regression shrinkage and selection via the LASSO. J. Royal Statist. Soc. B, 58:267–288, 1996..
 IEEE Trans. Neural Networks Learning Systems, 99 accepted for publication. 2013., , and . The multikernel least mean square algorithm.
 et al.Novelty detection in mass spectral data using a support vector machine method. In Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, 2000., ,
 Support vector machine active learning with applications to text classification. J. Machine Learning Res., 2:45–66, 2002. and .
 Numerical Linear Algebra.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1997. and
 Multi-label classification: An overview. Int. J. Data Warehousing Mining, 3:1–13, 2007. and .
 Data Mining and Knowledge Discovery Handbook, 2nd edition. Berlin: Springer, 2010., , and . Mining multi-label data. In and (Editors),
 On the stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):195–198, 1943..
 Two-mode clustering methods: A structured overview. Statist. Methods Med. Res., 13(5):363–394, 2004., , and .
 et al.Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.. ,
 Estimation of dependences based on empirical data [in Russian]. Moscow, Nauka, 1979. (English translation New York: Springer, 1982.).
 Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT Press, pages 281–287, 1997., , and . Support vector method for function approximation, regression estimation, and signal processing. In , , and (editors),
 The Nature of Statistical Learning Theory.New York: Springer-Verlag, 1995..
 Statistical Learning Theory.New York: Wiley, 1998..
 Decision trees for hierarchical multi-label classification. Machine Learning, 2(73):185–214, 2008., , , , and .
 Proceedings of the 6th International Workshop on Self-Organizing Maps.Bielefeld: Bielefeld University, 2007. and . A comparison between dissimilarity SOM and kernel SOM clustering the vertices of a graph. In
 Spline Models for Observational Data.Philadelphia, PA: SIAM, 1990..
 BMC Bioinformatics, 13:290, 2012 (available at http://link.springer.com/article/10.1186/1471-2105-13-290/fulltext.html)., , and . mGOASVM: Multi-label protein sub- cellular localization based on gene ontology and support vector machines.
 Proceedings of ICASSP '13, Vancouver, pages 3547–3551, 2013., , and . Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction. In
 A cluster validity measure with outlier detection for support vector clustering. IEEE Trans. Systems, Man, Cybernet. B, 38:78–89, 2008. and .
 The doubly regularized support vector machine. Statist. Sinica, 16:589–615, 2006., , and .
 Proceedings of the IEEE Workshops on Multimedia Signal Processing.Princeton, MA: IEEE Press, 1997., , , , and . In
 et al.Computational intelligence approach for gene expression data mining and classification. In Proceedings of the IEEE International Conference on Multimedia & Expo.Princeton, MA: IEEE Press, 2003., ,
 et al. Discriminatory mining of gene expression microarray data. J. Signal Processing Systems, 35:255–272, 2003., ,
 Hierarchical grouping to optimize an objective function. J. Am. Statist. Assoc., 58:236–244, 1963..
 Smooth regression analysis. Sankhya: Indian J. Statist. Ser. A, 26:359–372, 1964..
 Beyond Regression: New Tools for Prediction and Analysis in the Behavior Science. PhD thesis, Harvard University, Cambridge, MA, 1974..
 , , , and . http://www.kyb.tuebingen.mpg.de/bs/people/spider/main.html.
 Use of the zero-norm with linear models and kernel methods. J. Machine Learning Res., 3:1439–1461, 2003., , , and .
 Proceedings of ESANN, Brussels, 1999. and . Multi-class support vector machines. In
 Adaptive Signal Processing.Englewood Cliffs, NJ: Prentice Hall, 1984. and .
 Interpolation and Smoothing of Stationary Time Series.Cambridge, MA: MIT Press, 1949..
[299b] Support vector machine implementations for classification clustering. BMC Bioinformatics, 7:S4, published online, 2006., , , and .
 Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J. Machine Learning Res., 6:1855–1887, 2005. and .
 Inverting modified matrices. In Statistical Research Group Memorandum Report 42, MR38136, Princeton University, Princeton, NJ, 1950..
 Kernel-induced optimal linear estimators and generalized Gauss-Markov theorems. Submitted 2013. and .
[302b] Proceedings of ICASSP '14, Florence, Italy, 2014., , , , and . Cost-effective kernel ridge regression implementation for keystroke-based active authentication system. In
 Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, pages 49–54, 2003., , and . Fuzzy c-means clustering algorithm based on kernel method. In
 iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J.|Theor. Biol., 284:42–51, 2011., , and .
 CLIFF: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(90001):306–315, 2001. and .
 Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press, pages 1801–1808, 2009., , and . Robust regression and LASSO. In
 II. Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645–678, 2005. and
 Proceedings of the International Conference on Machine Learning, pages 848–855, 2003., , , and Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistics. In
 Outliers treatment in support vector regression for financial time series prediction. In ICONIP '04, pages 1260–1265, 2004., , , , and .
 IEEE International Workshop on Machine Learning for Signal Processing, Southampton, 2013., , , and . Ridge-adjusted slack variable optimization for supervised classification. In
 Proceedings, 2013 International Conference on Acoustics, Speech, and Signal Processing, 2013., , and . A classification scheme for “high-dimensional-small-sample-size” data using SODA and ridge-SVM with medical applications. In
[311b] Proceedings, 12th International Conference on Machine Learning and Applications (ICMLA '13), volume 4B, page 340., , and . Kernel SODA: A feature reduction technique using kernel based analysis. In
 Recent advances of large-scale linear classification. Proc. IEEE, 100:2584–2603, 2012., and .
 Multikernel adaptive filtering. IEEE Trans. Signal Processing, 60(9):4672–4682, 2012..
 The motion coherence theory. In Proceedings, International Conference on Computer Vision, pages 344–353, 1988. and .
 A novel kernelized fuzzy c-means algorithm with application in medical image segmentation. Artif. Intell. Med., 32:37–50, 2004. and .
 Support vector machine learning for image retrieval. In Proceedings of the 2001 International Conference on Image Processing, volume 2, pages 721–724, 2001., , and .
 Computational prediction of eukaryotic protein-coding genes. Nature Rev. Genetics, 3(9):698–709, 2002..
 et al.Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(197), 2006., ,
 Proceedings of ICASSP, Kyoto, 2012., , and . Color-frequency-orientation histogram based image retrieval. In
[319b] Methods of Microarray Data Analysis, CAMDA '00.Boston, MA: Kluwer Academic Publishers, pages 125–136, 2001., , and . Applying classification separability analysis to microarray data. In and , editors,
 Advances in Neural Information Processing Systems 16.Cambridge, MA: MIT press, 2004., , , and . 1-norm SVMS. In
 Regularization and variable selection via the elastic net. J. Royal Statist. Soc., Ser. B, 67(2):301–320, 2005. and .