Skip to main content Accessibility help
×
Home
Kernel Methods and Machine Learning
  • Cited by 91
  • Export citation
  • Recommend to librarian
  • Buy the print book

Book description

Offering a fundamental basis in kernel-based learning theory, this book covers both statistical and algebraic principles. It provides over 30 major theorems for kernel-based supervised and unsupervised learning models. The first of the theorems establishes a condition, arguably necessary and sufficient, for the kernelization of learning models. In addition, several other theorems are devoted to proving mathematical equivalence between seemingly unrelated models. With over 25 closed-form and iterative algorithms, the book provides a step-by-step guide to algorithmic procedures and analysing which factors to consider in tackling a given problem, enabling readers to improve specifically designed learning algorithms, build models for new applications and develop efficient techniques suitable for green machine learning technologies. Numerous real-world examples and over 200 problems, several of which are Matlab-based simulation exercises, make this an essential resource for graduate students and professionals in computer science, electrical and biomedical engineering. Solutions to problems are provided online for instructors.

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Send to Kindle
  • Send to Dropbox
  • Send to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
×

Contents


Page 1 of 2



Page 1 of 2


References
[1] C. C., Aggarwal and P. S., Yu. Outlier detection for high dimensional data. In Proceedings of ACM SIGMOD, pages 37–46, 2001.
[2] A. C., Aitken. On least squares and linear combinations of observations. Proc. Royal Soc. Edinburgh, 55:42–18, 1935.
[3] H., Akaike. A new look at the statistical model identification. IEEE Trans. Automatic Control, 19(6):716–723, 1974.
[4] A. A., Alizadeh, M. B., Eisen, R. E., Daviset al.Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000.
[5] U., Alon, N., Barkai, D. A., Nottermanet al.Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA, 96(12):6745, 1999.
[6] P., Andras. Kernel-Kohonen networks. Int. J. Neural Systems, 12;117–135, 2002.
[7] S. A., Armstrong, J. E., Staunton, L. B., Silvermanet al.MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41–47, 2002.
[8] N., Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., 68:337–404, 1950.
[9] P., Baldi and S., Brunak. Bioinformatics: The Machine Learning Approach, 2nd edition. Cambridge, MA: MIT Press, 2001.
[10] G., Baudat and F., Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12:2385–2404, 2000.
[11] D. G., Beer, S. L., R. Kardia, C.-C. Huanget al.Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med., 8:816–824, 2002.
[12] R., Bellman. Dynamic Programming.Princeton, NJ: Princeton University Press, 1957.
[13] R., Bellman. Adaptive Control Processes: A Guided Tour.Princeton, NJ: Princeton University Press, 1961.
[14] Ben-David, S. and Lindenbaum, M.. Learning distributions by their density levels: A paradigm for learning without a teacher. J. Computer System Sci., 55:171–182, 1997.
[15] A., Ben-Dor, L., Bruhn, N., Friedmanet al.Tissue classification with gene expression profiles. J. Computai. Biol., 7:559–583, 2000.
[16] A., Ben-Hur, D., Horn, H., Siegelmann, and V., Vapnik. A support vector method for hier¬archical clustering. In T. K., Leen, T. G., Dietterich, and V., Tresp, editors, Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press.
[17] D. P., Bertsekas. Nonlinear Programming.Belmont, MA: Athena Scientific, 1995.
[18] C., Bhattacharyya. Robust classification of noisy data using second order cone program¬ming approach. In Proceedings, International Conference on Intelligent Sensing and Information Processing, pages 433–438, 2004.
[19] J., Bibby and H., Toutenburg. Prediction and Improved Estimation in Linear Models. New York: Wiley, 1977.
[20] C. M., Bishop. Neural Networks for Pattern Recognition.Oxford: Oxford University Press, 1995.
[21] C. M., Bishop. Training with noise is equivalent to Tikhonov regularization. Neural Comput., 7:108–116, 1995.
[22] C. M., Bishop. Pattern Recognition and Machine Learning.Berlin: Springer, 2006.
[23] K. L., Blackmore, R. C., Williamson, I. M., Mareels, and W. A., Sethares. Online learning via congregational gradient descent. In Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT '95).New York: ACM Press, pages 265–272, 1995.
[24] B. E., Boser, I. M., Guyon, and V. N., Vapnik. A training algorithm for optimal margin classifiers. In D., Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992.
[25] R., Boulet, B., Jouve, F., Rossi, and N., Villa. Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7-9):1257–1273, 2008.
[26] M., Boutell, J., Luo, X., Shen, and C., Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.
[27] P. S., Bradley and O. L., Mangasarian. Feature selection via concave minimization and support vector machines. In International Conference on Machine Learning, pages 82–90, 1998.
[28] U. M., Braga-Neto and E. R., Dougherty. Is cross-validation valid for small-sample microarray classification?Bioinformatics, 20(3):378–380, 2004.
[29] M. P. S., Brown, W. N., Grundy, D., Linet al.Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. USA, 97(1):262–267, 2000.
[30] C.J.C., Burges. A tutorial on support vector machines for pattern recognition. Knowledge Discovery Data Mining, 2(2):121–167, 1998.
[31] F., Camastra and A., Verri. A novel kernel method for clustering. IEEE Trans. Pattern Anal. Machine Intell., 27(5):801–804, 2005.
[32] C., Campbell and K. P., Bennett. A linear programming approach to novelty detection. In Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, 2001.
[33] H., Cao, T., Naito, and Y., Ninomiya. Approximate RBF kernel SVM and its applications in pedestrian classification. In Proceedings, The 1st International Workshop on Machine Learning for Vision-based Motion Analysis - MLVMA08, 2008.
[33b] J. Morris, Chang, C. C., Fang, K. H., Hoet al.Capturing cognitive fingerprints from keystroke dynamics. IT Professional, 15(4):24–28, 2013.
[34] Chih-Chung, Chang and Chih-Jen, Lin. LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Technol., 2(27):1–27, 2011. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
[35] O., Chapelle, P., Haffner, and V. N., Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks, 10:1055–1064, 1999.
[36] O., Chapelle, V., Vapnik, O., Bousquet, and S., Mukhejee. Choosing kernel parameters for support vector machines. In Machine Learning, 46:131–159, 2002.
[37] P. H., Chen, C. J., Lin, and B., Schölkopf. A tutorial on v-support vector machines, 2003 (http://www.kernel-machines.org).
[38] Y., Cheng and G. M., Church. Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), volume 8, pages 93–103, 2000.
[39] V., Chercassky and P., Mullier. Learning from Data, Concepts, Theory and Methods. New York: John Wiley, 1998.
[40] A., Clare and R. D., King. Knowledge discovery in multi-label phenotype data. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 42–53, 2001.
[41] C., Cortes and V., Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.
[42] R., Courant and D., Hilbert. Methods of Mathematical Physics.New York: Interscience, 1953.
[43] R., Courant and D., Hilbert. Methods of Mathematical Physics, volumes I and II. New York: Wiley Interscience, 1970.
[44] T. M., Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Computers, 14:326–334, 1965.
[44b] T. F., Cox and M. A., A. Cox. Multidimensional Scaling.London: Chapman and Hall, 1994.
[45] Data set provider. http://www.igi.tugraz.at/aschwaig.
[46] P., de Chazal, M., O'Dwyer, and R. B., ReillyAutomatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng., 51(7):1196–1206, 2004.
[47] A. P., Dempster, N. M., Laird, and D. B., Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc., Ser. B, 39(1):1–38, 1977.
[48] I. S., Dhillon, Y., Guan, and B., Kulis. Kernel K-means, spectral clustering and normalized cuts. In Proceedings of the 10th ACM KDD Conference, Seattle, WA, 2004.
[49] K., Diamantaras and M., Kotti. Binary classification by minimizing the mean squared slack. In Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP-2012), Kyoto, pages 2057–2060, 2012.
[50] K. I., Diamantaras and S. Y., Kung. Principal Component Neural Networks. New York: Wiley, 1996.
[51] T. G., Dietterich and G., Bakiri. Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., 2:263–286, 1995.
[52] H., Drucker, C. J. C., Burges, L., Kaufman, Smola, A., and V., Vapnik. Support vector regression machines. In Advances in Neural Information Processing Systems (NIPS '96), Volume 9. Cambridge, MA: MIT Press, pages 155–161, 1997.
[53] R. O., Duda and P. E., Hart. Pattern Classification and Scene Analysis.New York: Wiley 1973.
[54] R. O., Duda, P. E., Hart, and D.G., Stork. Pattern Classification, 2nd edition. New York: Wiley, 2011.
[55] S., Dudoit, J., Fridlyand, and T. P., Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576, Department of Statistics, University of California, Berkeley, CA, 2000.
[56] S., Dudoit, J., Fridlyand, and T. P., Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Statist. Assoc., 97:77–88, 2002.
[57] R., Durbin and D. J., Willshaw. An analogue approach to the travelling salesman problem using an elastic net method. Nature, 326:689–691, 1987.
[58] B., Efron. Bootstrap methods: Another look at the jackknife. Ann. Statist., 7:1–26, 1979.
[59] B., Efron. The Jackknife, the Bootstrap and Other Resampling Plans.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1982.
[60] B., Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Statist. Assoc., 78:316–331, 1983.
[61] M. B., Eisen, P. T., Spellman, P. O., Brown, and D., Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. USA, 95:14863–14868, 1998.
[62] Y., Engel, S., Mannor, and R., Meir. The kernel recursive least-squares algorithm. IEEE Trans. Signal Processing, 52(8):2275–2285, 2004.
[63] B., Schölkopf, C. J. C., Burges, and A. J., Smola (editors). Advances in Kernel Methods -Support Vector Learning.Cambridge, MA: MIT Press, 1999.
[64] M., Aizerman, E. A., Braverman, and L., Rozonoer. Theoretical foundation of the potential function method in pattern recognition learning. Automation Remote Control, 25:821–837, 1964.
[65] T., Graepel, R., Herbrich, P., Bollman-Sdorra, and K., Obermayer. Classification on pairwise proximity data. Advances in Neural Information Processing Systems 11.Cambridge, MA: MIT Press, pages 438–444, 1999.
[66] T. R., Golub, D. K., Slonim, P., Tamayoet al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
[67] M., Filippone, F., Camastra, F., Masulli, and S., Rosetta. A survey of kernel and spectral methods for clustering. Pattern Recognition, 41:176–190, 2008.
[68] R. A., Fisher. The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7:179–188, 1936.
[69] R., Fletcher. Practical Methods of Optimization, 2nd edition. New York: Wiley, 1987.
[70] E. W., Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21:768–769, 1965.
[71] R. J., Fox and M. W., Dimmic. A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7(1):126, 2006.
[72] Y., Freund and R., Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
[73] K., Fukunaga. Introduction to Statistical Pattern Recognition, 2nd edition. Amsterdam: Elsevier, 1990.
[73b] G., Fung and O. L., Mangasarian. Proximal support vector machine classifiers. In Proceedings, ACMKDD01, San Francisco, 2001.
[74] T. S., Furey, N., Cristianini, N., Duffyet al.Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10):906–914, 2000.
[75] I., Gat-Viks, R., Sharan, and R., Shamir. Scoring clustering solutions by their biological relevance. Bioinformatics, 19(18):2381–2389, 2003.
[76] T. V., Gestel, J. A. K., Suykens, G., Lanckrietet al.Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Comput., 14(5):1115–1147, 2002.
[77] F. D., Gibbons and F. P., Roth. Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res., 12:1574–1581, 2002.
[78] M., Girolami. Mercer kernel based clustering in feature space. IEEE Trans. Neural Networks, 13(3):780–784, 2002.
[79] G., Golub andLoan, C. F. Van. Matrix Computations, 3rd edition. Battimore, MD: Johns Hopkins University Press, 1996.
[80] G. H., Golub and W., Kahan. Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Industrial Appl. Math.: Ser. B, Numerical Anal., 2(2):205–224, 1965.
[81] G., Golub and C., van Loan. An analysis of the total least squares problem. SIAM J. Numerical Anal., 17:883–893, 1980.
[82] T. R., Golub, D. K., Slonim, C., Huardet al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
[83] G., Goodwin and K., Sin. Adaptive Filtering: Prediction and Control.Englewood Cliffs, NJ: Prentice Hall, 1984.
[84] C., Goutte. Note on free lunches and cross-validation. Neural Comput., 9:1211–1215, 1997.
[85] T., Graepel and K., Obermayer. Fuzzy topographic kernel clustering. In W., Brauer, editor, Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems, pages 90–97, 1998.
[86] Z., Griliches and V., Ringstad. Errors-in-the-variables bias in nonlinear contexts. Econometrica, 38(2):368–370, 1970.
[87] S. R., Gunn. Support vector machines for classification and regression. USC-ISIS Technical ISIS Technical Report, 1998.
[88] J., Guo, M. W., Mak, and S. Y., Kung. Eukaryotic protein subcellular localization based on local pairwise profile alignment SVM. Proceedings, 2006 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '06), pages 416–422, 2006.
[89] I., Guyon, J., Weston, S., Barnhill, and V., Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002.
[90] B. H., Juang, S. Y., Kung, and Kamm, C. A. (Editors). Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, Princeton, NJ, 1991.
[91] B., Hammer, A., Hasenfuss, F., Rossi, and M., Strickert. Topographic processing of relational data. In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefeld, 2007.
[92] J. A., Hartigan. Direct clustering of a data matrix. J. Am. Statist. Assoc., 67(337):123–129, 1972.
[93] T., Hastie, R., Tibshirani, M., Eisenet al.“Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol., 1(2):research0003.1-research0003.21, 2000.
[94] S., Haykin. Adaptive Filter Theory, 3rd edition. Englewood Cliffs, NJ: Prentice Hall, 1996.
[95] S., Haykin. Neural Networks: A Comprehensive Foundation, 2nd edition. Englewood Cliffs, NJ: Prentice Hall, 2004.
[96] X., He, D., Cai, and P., Niyogi. Laplacian score for feature selection. In Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press, 2005.
[97] M. A., Hearst, B., Schölkopf, S., Dumais, E., Osuna, and J., Platt. Trends and controversies -support vector machines. IEEE Intelligent Systems, 13:18–28, 1998.
[98] V. J., Hodge and J., Austin. A survey of outlier detection methodologies. Intell. Rev., 22:85–126, 2004.
[99] A. E., Hoerl and R. W., Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1):80–86, 1970.
[99b] T., Hofmann, B., Schölkopf, and A. J., Smola. Kernel methods in machine learning. Ann. Statist., 36(3):1171–1220, 2008.
[100] H., Hotelling. Analysis of a complex of statistical variables into principal components. J. Educational Psychol., 24:498–520, 1933.
[101] C. W., Hsu and C. J., Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 13(2):415–425, 2002.
[102] D., Hsu, S. M., Kakade, J., Langford, and T., Zhang. Multi-label prediction via compressed sensing). In Advances in Neural Information Processing Systems 22, Cambridge, MA: MIT Press, pages 772–780, 2009.
[103] S., Hua and Z., Sun. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. J. Molec. Biol., 308(2):397–W7, 2001.
[104] Y., Huang and Y. D., Li. Prediction of protein subcellular locations using fuzzy K-NN method. Bioinformatics, 20(1):21–28, 2004.
[105] P. J., Huber. Robust statistics: A review. Ann. Math. Statist., 43:1041–1067, 1972.
[106] P. J., Huber. Robust Statistics.New York: John Wiley and Sons, 1981.
[107] N., Iizuka, M., Oka, H., Yamada-Okabeet al.Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Lancet, 361(9361):923–929, 2003.
[108] R., Inokuchi and S., Miyamoto. LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, Volume 3, pages 1497–1500, 2004.
[109] L. B., Jack and A. K., Nandi. Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mechanical Systems Signal Processing, 16:373–390, 2002.
[110] P., Jafari and F., Azuaje. An assessment of recently published gene expression data analyses: Reporting experimental design and statistical factors. BMC Med. Inform., 6:27, 2006.
[111] A. K., Jain, M. N., Murty, and P. J., Flynn. Data clustering: A review. ACM Comput. Surveys, 31(3):264–323, 1999.
[112] W., James and C., Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, Volume 1. Berkeley, CA: University of California Press, pages 361–380, 1960.
[113] T., Joachims. Text categorization with support vector machines: Learning with many relevant features. Proceedings of European Conference on Machine Learning, Berlin: Springer, pages 137–142, 1997.
[114] T., Joachims. Making large-scale SVM learning practical. In B., Schölkopf, C., Burges, and A., Smola, editors, Advances in Kernel Methods - Support Vector Learning.Cambridge, MA: MIT Press, 1999.
[115] I. T., Jolliffe. Principal Component Analysis, 2nd edition. New York: Springer, 2002.
[116] E. M., Jordaan and G. F., Smits. Robust outlier detection using SVM regression. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, volume 3, pages 2017–2022, 2004.
[117] M. I., Jordan and C. M., Bishop. An Introduction to Probabilistic Graphical Models.Cambridge, MA: MIT Press, 2002.
[118] B. H., Juang, S. Y., Kung, and C. A., Kamm. IEEE Workshops on Neural Networks for Signal Processing.New York: IEEE Press, 1991.
[119] T., Kailath. Linear Systems.Englewood Cliffs, NJ: Prentice Hall, 1980.
[120] T., Kailath, A. H., Sayed, and B., Hassibi. Linear Esitmation.Englewood Cliffs, NJ: Prentice Hall, 2000.
[121] M., Kass, A., Witkin, and D., Terzopoulos. Snakes: Active contour models. Int. J. Computer Vision, 1:321–331, 1987.
[122] S. S., Keerthi, S. K., Shevade, C., Bhattacharyya, and K. R., K. Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13:637–649, 2001.
[123] J., Khan, J. S., Wei, M., Ringneret al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6):673–679, 2001.
[124] J., Khan, J. S., Wei, M., Ringneret al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7:673–679, 2001.
[125] K. I., Kim, K., Jung, S. H., Park, and H. J., Kim. Texture classification with kernel principal component analysis. Electron. Lett., 36(12):1021–1022, 2000.
[126] G. S., Kimeldarf and G., Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applications, 33:82–95, 1971.
[127] S., Kirkpatrick, C. D., Gelat, and M. P., Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983.
[128] R., Kohavi and G. H., John. Wrappers for feature selection. Artif. Intell., 97(1-2):273–324, 1997.
[129] T., Kohonen. Self-organized formation of topologically correct feature map. Biol. Cybernet., 43:59–69, 1982.
[130] T., Kohonen. Self-Organization and Associative Memory.New York: Springer, 1984.
[131] T., Kohonen. Self-Organizing Maps, 2nd edition. Berlin: Springer, 1997.
[132] T., Kudo and Y., Matsumoto. Chunking with support vector machines. In Proceedings, North American Chapter of the Association for Computational Linguistics, 2001.
[133] H. T., Kung. Why systolic architectures?IEEE Computer, 15(1):37–46, 1982.
[134] S. Y., Kung. VLSI Array Processors.Englewood Cliffs, NJ: Prentice Hall, 1988.
[135] S. Y., Kung. Digital Neural Networks.Englewood Cliffs, NJ: Prentice Hall, 1993.
[136] S. Y., Kung. Kernel approaches to unsupervised and supervised machine learning. In Proceedings of PCM 2009, Bangkok, pages 1–32. Berlin: Springer-Verlag, 2009.
[137] S. Y., Kung, K. I., Diamantaras, and J. S., Taur. Adaptive principal component extraction (APEX) and applications. IEEE Trans. Signal Processing, 42(5):1202–1217, 1994.
[138] S. Y., Kung, F., Fallside, J. A., Sorensen, and C. A., Kamm (Editors). Neural Networks for Signal Processing II.Piscataway, NJ: IEEE, 1992.
[139] S. Y., Kung and Man-Wai, Mak. PDA-SVM hybrid: A unified model for kernel-based supervised classification. J. Signal Processing Systems, 65(1):5–21, 2011.
[140] S. Y., Kung and M. W., Mak. Feature selection for self-supervised classification with applications to microarray and sequence data. IEEE J. Selected Topics Signal Processing: Special Issue Genomic and Proteomic Signal Processing, 2(3):297–309, 2008.
[141] S. Y., Kung, M. W., Mak, and S. H., Lin. Biometric Authentication: A Machine Learning Approach.Upper Saddle River, NJ: Prentice Hall, 2005.
[142] S. Y., Kung, M. W., Mak, and I., Tagkopoulos. Multi-metric and multi-substructure biclus- tering analysis for gene expression data. In IEEE Computational Systems Bioinformatics Conference, Stanford, CA, 2005.
[143] S. Y., Kung, M. W., Mak, and I., Tagkopoulos. Symmetric and asymmetric multi- modality biclustering analysis for microarray data matrix. J. Bioinformatics Comput. Biol., 4(3):275–298, 2006.
[144] S. Y., Kung and Peiyuan, Wu. On efficient learning and classification kernel methods. In Proceedings, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12), Kyoto, 2012.
[145] S. Y., Kung and Peiyuan, Wu. Perturbation regulated kernel regressors for supervised machine learning. In Proceedings, 2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '12), 2012.
[146] S. Y., Kung and Yuhui, Luo. Recursive kernel trickfor networksegmentation. Int. J. Robust Nonlinear Control, 21(15):1807–1822, 2011.
[147] N., Lawrence and B., Schölkopf. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the 18th International Conference on Machine Learning, San Francisco. New York: Morgan Kaufman, 2001.
[148] L., Lazzeroni and A. B., Owen. Plaid models for gene expression data. Technical report, 03, 2000 (www-stat.stanford.edu/owen/reports/plaid.pdf).
[149] K. H., Lee, S. Y., Kung, and N., Verma. Improving kernel-energy trade-offs for machine learning in implantable and wearable biomedical applications. In Proceedings of ICASSP, pages 1597–1600, 2011.
[150] K. H., Lee, S. Y., Kung, and N., Verma. Low-energy formulations of support vector machine kernel functions for biomedical sensor applications. Journal of Signal Processing Systems, Berlin: Springer, published online, 2012.
[151] B., Levy. Principles of Signal Detection and Parameter Estimation.Berlin: Springer, 2008.
[152] Dan, Li. Performance Evaluation of Maximum Log-Likelihood Classification, ELE 571 Course Project Report, Princeton University, Princeton, NJ, 2010.
[153] L. Q., Li, Y., Zhang, L. Y., Zou, Y., Zhou, and X. Q., Zheng. Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition. Protein Peptide Lett., 19:375–387, 2012.
[154] T., Li and M., Ogihara. Toward intelligent music information retrieval. IEEE Trans. Multimedia, 8(3):564–574, 2006.
[155] Y., Liang, H., Wu, R., Leiet al.Transcriptional network analysis identifies BACH1 as a master regulator of breast cancer bone metastasis. J. Biol. Chem., 287(40):33533–33544, 2012.
[156] Opensource EEG libraries and toolkit. http://www.goomedic.com/opensource-eeg-libraries-and-toolkits-for-developers.html.
[157] W., Liu, J. C., Principe, and S., Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction.New York: Wiley, 2010.
[158] L., Ljung. System Identification: Theory for the User.Englewood Cliffs, NJ: Prentice Hall, 1999.
[159] Z.-P., Lo, M., Fujita, and B., Bavarian. Analysis of neighborhood interaction in Kohonen neural networks. In Proceedings, 6th International Parallel Processing Symposium, Los Alamitos, CA, pages 247–249, 1991.
[160] Z.-P., Lo, Y., Yu, and B., Bavarian. Analysis of the convergence properties of topology preserving neural networks. IEEE Trans. Neural Networks, 4:207–220, 1993.
[161] G., Lubec, L, Afjehi-Sadat, J. W., Yang, and J. P., John. Searching for hypothetical proteins: Theory and practice based upon original data and literature. Prog. Neurobiol., 77:90–127, 2005.
[162] S., Ma and J., Huang. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 21(24):4356–4362, 2005.
[163] D., MacDonald and C., Fyfe. The kernel self organising map. In Proceedings of 4th International Conference on Knowledge-Based Intelligence Engineering Systems and Applied Technologies, 2000.
[164] M., MacQueen. Some methods for classification and analysis of multivariate observation. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Proba¬bilities (L. M., LeCun and J., Neyman, editors, volume 1, pages 281–297. Berkeley, CA: University of California Press, 1967.
[165] P. C., Mahalanobis. On the generalised distance in statistics. J. Proc. Asiatic Soc. Bengal, 2:49–55, 1936.
[166] M. W., Mak, J., Guo, and S. Y., Kung. PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 5(3):416–422, 2008.
[167] M. W., Mak and S. Y., Kung. A solution to the curse of dimensionality problem in pairwise scoring techniques. In International Conference on Neural Information Processing, pages 314–323, 2006.
[168] M. W., Mak and S. Y., Kung. Fusion of feature selection methods for pairwise scoring SVM. Neurocomputing, 71(16-18):3104–3113, 2008.
[169] M. W., Mak and S. Y., Kung. Low-power SVM classifiers for sound event classification on mobile devices. In Proceedings of ICASSP, Kyoto, 2012.
[170] O. L., Mangasarian, G., Fung, and J. W., Shavlik. Knowledge-based nonlinear kernel clas- sifers. In Learning Theory and Kernel Machines.Berlin: Springer-Verlag, pages 102–113, 2003.
[171] H. B., Mann and D. R., Whitney. On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Statist., 18:50–60, 1947.
[172] Mathworks-SVM. Mathworks bioinformatics toolbox.
[173] M., Mavroforakis and S., Theodoridis. A geometric approach to support vector machine (SVM) classification. IEEE Trans. Neural Networks, 17(3):671–682, 2006.
[174] G. J., McLachlan. Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley & Sons, 1992.
[175] J., Mercer. Functions of positive and negative type, and their connection with the theory of integral equations. Trans. London Phil. Soc., A209:415–446, 1909.
[176] S., Mika. Kernel Fisher Discriminants. PhD thesis, The Technical University of Berlin, Berlin, 2002.
[177] S., Mika, G., Ratsch, and K. R., Muller. A mathematical programming approach to the kernel Fisher algorithm. In Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, pages 591–597, 2001.
[178] S., Mika, G., Ratsch, J., Weston, B., Schölkopf, and K. R., Mullers. Fisher discriminant analysis with kernels. In Y. H., Hu, J., Larsen, E., Wilson, and S., Douglas, editors, Neural Networks for Signal Processing IX, pages 41–48, 1999.
[179] S., Mika, A.J., Smola, and B., Schölkopf. An improved training algorithm for kernel Fisher discriminants. In T., Jaakkola and T., Richardson, editors, Proceedings AISTATS, San Francisco, CA, pages 98–104. New York: Morgan Kaufmann, 2001.
[180] B., Mirkin. Mathematical Classification and Clustering.Berlin: Springer, 1996.
[181] T. M., Mitchell. Machine Learning.New York: McGraw-Hill, 1997.
[182] P. J., Moreno, P. P., Ho, and N., Vasconcelos. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Technical Report, HP Laboratories Cambridge, 2004.
[183] MSPsim. http://www.sics.se/project/mspsim.
[184] S., Mukherjee, E., Osuna, and F., Girosi. Nonlinear prediction of chaotic time series using support vector machines. In J., Principe, L., Giles, N., Morgan, and E., Wilson, editors, Proceedings, IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, FL, pages 276–285, 1997.
[185] K. R., Muller, S., Mika, G., Ratsch, K., Tsuda, and B., Schölkopf. An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001.
[186] K. R., Muller, A., Smola, G., Ratschet al.Predicting time series with support vector machines. In Proceedings, International Conference on Artificial Neural Networks, London: Springer-Verlag, pages 999–1004, 1997.
[187] K. R., Muller, S., Mika, G., Ratsch, K., Tsuda, and B., Schölkopf. An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001.
[188] N., Murata, K. R., Muller, A., Ziehe, and S., Amari. Adaptive on-line learning in changing environments. In M. C., Mozer, M. I., Jordan, and T, Petsche, editors, Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT press, pages 599–605, 1997.
[189] C. L., Myers. Context-sensitive methods for learning from genomic data. Thesis, Department of Electrical Engineering, Princeton University, Princeton, NJ, 2007.
[190] C.L., Myers, M., Dunham, S.Y., Kung, and O., Troyanskaya. Accurate detection of aneu-ploidies in array cgh and gene expression microarray data. In Bioinfomotics. Published online, Oxford University Press, 2005.
[191] E. A., Nadaraya. On estimating regression. Theory Probability Applicationss, 9:141–142, 1964.
[192] Neural network frequently asked questions. http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html.
[193] J., Neyman and E. S., Pearson. On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20:175–240, 1928.
[194] S., Niijima and Y., Okuno. Laplacian linear discriminant analysis approach to unsupervised feature selection. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 6(4):605–614, 2009.
[195] C. L., Nutt, D. R., Mani, R. A., Betenskyet al.Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res., 63(7):1602–1607, 2003.
[196] E., Oja. A simplified neuron model as a principal component analyzer. J. Math. Biol., 15:267–273, 1982.
[197] E., Osuna, R., Freund, and E., Girosi. An improved training algorithm for support vector machines. In J., Principe, L., Giles, N., Morgan, and E., Wilson, Editors, Proceedings, IEEE Workshop on Neural Networks for Signal Processing VII, Amelia Island, FL, pages 276–285, 1997.
[198] D., Parker. Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, MIT, Cambridge, MA, 1985.
[199] E., Parzen. On estimation of a probability density function and mode. Ann. Math. Statist., 33:1065–1076, 1962.
[200] P., Pavlidis, J., Weston, J., Cai, and W. N., Grundy. Gene functional classification from heterogeneous data. In International Conference on Computational Biology, Pittsburgh, PA, pages 249–255, 2001.
[201] K., Pearson. On lines and planes of closest fit to systems of points in space. Phil. Mag. Ser. 6, 2:559–572, 1901.
[202] M. S., Pepe. The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford: Oxford University Press, 2003.
[203] PhysioNet. http://www.physionet.org.
[204] J. C., Platt. Fast training of support vector machines using sequential minimal opti¬mization. In B., Schölkopf, C. J. C., Burges, and A. J., Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA: MIT Press, pages 185–208, 1999.
[205] J. C., Platt. Using analytic QP and sparseness to speed training of support vector machines. In Advances in Neural Information Processing Systems 10, 1998.
[206] N., Pochet, F., De Smet, J. A. K., Suykens, and B. L. R., De Moor. Systematic benchmarking of microarray data classification: Assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20(17):3185–3195, 2004.
[207] T., Poggio and F., Girosi. Networks for approximation and learning. Proc. IEEE, 78(9):1481–1497, 1990.
[208] S. L., Pomeroy, P., Tamayo, M., Gaasenbeeket al.Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–442, 2002.
[209] M., Pontil and A., Verri. Support vector machines for 3D object recognition. IEEE Trans. Pattern Analysis Machine Intell., 20:637–646, 1998.
[210] H. V., Poor. An Introductionn to Signal Dection and Estimation, 2nd edition, Berlin: Springer, 1994.
[211] D. M., Pozar. Microwave Engineering, 3rd edition. New York: Wiley, 2005.
[212] J. C., Rajapakse, K. B., Duan, and W. K., Yeo. Proteomic cancer classification with mass spectrometry data. Am. J. Pharmacogenomics, 5(5):281–292, 2005.
[213] S., Ramaswamy, P., Tamayo, R., Rifkinet al.Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, 98(26):15149–15154, 2001.
[214] J., Read, B., Pfahringer, G., Holmes, and E., Frank. Classifier chains for multi-label classification. In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 254–269, 2009.
[215] M., Reich, K., Ohm, M., Angelo, P., Tamayo, and J. P., MesirovGeneCluster 2.0: An advanced toolset for bioarray analysis. Bioinformatics, 20(11):1797–1798, 2004.
[216] A., Reinhardt and T., Hubbard. Using neural networks for prediction of the subcellular location of proteins. Nucl. Acids Res., 26:2230–2236, 1998.
[217] B. D., Ripley. Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press, 1996.
[218] J., Rissanen. A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, 1983.
[219] H., Ritter, T., Martinetz, and K., Schulten. Neural Computation and Self-Organizing Maps: An Introduction.Reading, MA: Addison-Wesley, 1992.
[220] F., Rosenblatt. The perceptron: A probabilistic model for information storage and organization of the brain. Psychol. Rev., 65:42–99, 1958.
[221] M., Rosenblatt. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 27:832–837, 1956.
[222] M., Rosenblatt. Density estimates and Markov sequences. In M., Puri, editor, Nonpara-metric Techniques in Statistical Inference.London: Cambridge University Press, pages 199–213, 1970.
[223] V., Roth, J., Laub, J. M., Buhmann, and K.-R., Muller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems 15.Cambridge, MA: MIT Press, pages 817–824, 2003.
[224] V., Roth and V., Steinhage. Nonlinear discriminant analysis using kernel functions. In S. A., Sola, T. K., Leen, and K.-R., Muller, editors, Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000.
[225] R., Rothe, Yinan, Yu, and S. Y., Kung. Parameter design tradeoff between prediction performance and training time for ridge-SVM. In Proceedings, 2013 IEEE International Workshop on Machine Learning For Signal Processing, Southampton, 2013.
[226] D. E., Rumelhart, G. E., Hinton, and R. J., Williams. Learning internal representations by error propagation. In D. E., Rumelhart, J. L., McClelland, and the PDP Research Group, editors, Parallel Distribution Processing: Explorations in the Microstruture of Cognition, Volume 1: Foundation.Cambridge, MA: MIT Press/Bradford Books, 1986.
[227] T. D., Sanger. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 12:459–173, 1989.
[228] A., Sayed. Fundamentals of Adaptive Filtering.New York: Wiley, 2003.
[229] A., Sayed and T., Kailath. A state space approach to adaptive RLS filtering. IEEE Signal Processing Mag., 11:18–60, 1994.
[230] A. H., Sayed. Fundamentals of Adaptive Filtering.John Wiley, 2003 (see page 30).
[231] R. E., Schapire and Y., Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.
[232] L., Scharf. Statistical Signal Processing.Reading, MA: Addison-Wesley, 1991.
[233] S. M., Schennach. Nonparametric regression in the presence of measurement error. Econometric Theory, 20(6):1046–1093, 2004.
[234] B., Schölkopf. Statistical learning and kernel methods. Technical Report MSR-TR 200023, Microsoft Research, 2000.
[235] B., Schölkopf, C., Burges, and V., Vapnik. Incorporating invariances in support vector learning machines. In Proceedings, International Conference on Artificial Neural Networks, 1996.
[236] B., Schölkopf, R., Herbrich, A., Smola, and R., Williamson. A generalized representer theorem. NeuroCOLT2 Technical Report Series, NC2-TR-2000-82, 2000.
[237] B., Schölkopf, J. C., Platt, J., Shawe-Taylor, A. J., Smola, and R. C., Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13:1443–1472, 2001.
[238] B., Schölkopf, A., Smola, and K.-R., Muller. Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998.
[238b] B., Schölkopf, A. J., Smola, R. C., Williamson, and P. L., Bartlett. New support vector algorithms. Neural Comput., 12:1207–1245, 2000.
[239] B., Schölkopf and A. J., Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.Cambridge, MA: MIT Press, 2002.
[240] B., Schölkopf, R. C., Williamson, A. J., Smola, J., Shawe-Taylor, and J. C., Platt. Support vector method for novelty detection. In S. A., Sola, T. K., Leen, and K.-R., Muller, editors, Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000.
[241] A., Schwaighofer. SVM toolbox for MATLAB.
[242] G., Schwartz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978.
[243] D. J., Sebald and J. A., Bucklew. Support vector machine techniques for nonlinear equalization. IEEE Trans. Signal Processing, 48(11):3217–3226, 2000.
[244] J., Shawe-Taylor and N., Cristianini. Support Vector Machines and Other Kernel-Based Learning Methods.Cambridge: Cambridge University Press, 2004.
[245] H. B., Shen and K. C., Chou. Virus-mPLoc: A fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn., 26:175–186, 2010.
[246] P., Simard, A., Smola., B., Schölkopf, and V., Vapnik. Prior knowledge in support vector kernels. Advances in Neural Information Processing Systems 10.640–646, 1998.
[247] D., Singh, P. G., Febbo, K., Rosset al.Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002.
[248] I., Yamada, K., Slavakis, and S., Theodoridis. Online classification using kernels and projection-based adaptive algorithms. IEEE Trans. Signal Processing, 56(7):2781–2797, 2008.
[249] T. F., Smith and M. S., Waterman. Comparison of biosequences. Adv. Appl. Math., 2:482–489, 1981.
[250] A. J., Smola, B., Schölkopf, and K. R., Müller. The connection between regularization operators and support vector kernels. Neural Networks, 11:637–649, 1998.
[251] A. J., Smola, P. L., Bartlett, B., Schölkopf, and D., Schuurmans. Advances in Large Margin Classifiers.Cambridge, MA: MIT Press, 2000.
[252] P. H. A., Sneath and R. R., Sokal. Numerical taxonomy: The Principles and Practice of Numerical Classification.San Francisco, CA: W. H. Freeman, 1973.
[253] M., Song, C., Breneman, J., Biet al.Prediction of protein retention times in anion- exchange chromatography systems using support vector regression. J. Chem. Information Computer Sci., 42:1347–1357, 2002.
[254] M. H., Song, J., Lee, S. P., Cho, K. J., Lee, and S. K., Yoo. Support vector machine based arrhythmia classification using reduced features. Int. J. Control, Automation, Systems, 3:571–579, 2005.
[255] T., Sørlie, R., Tibshirani, J., Parkeret al.Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14):8418–8423, 2003.
[256] D. F., Specht. Probabilistic neural networks. Neural Networks, 3:109–118, 1990.
[257] P. T., Spellman, G., Sherlock, M. Q., Zhanget al.Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12):3273–3297, 1998.
[258] M. O., Stitson and J. A. E., Weston. Implementational issues of support vector machines. Technical Report CSD-TR-96-18, Computational Intelligence Group, Royal Holloway, University of London, 1996.
[259] G., Strang. Introduction to Linear Algebra.Wellesley, MA: Wellesley Cambridge Press, 2003.
[260] J. A. K., Suykens and J., Vandewalle. Least squares support vector machine classifiers. Neural Processing Lett., 9(3):293–300, 1999.
[261] SVMlight. http://svmlight.joachims.org/.
[262] I., Tagkopoulos, N., Slavov, and S. Y., Kung. Multi-class biclustering and classification based on modeling of gene regulatory networks. In Proceedings, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE '05).Minneapolis, MN, pages 89–96, 2005.
[263] P., Tamayo, D., Slonim, J., Mesirovet al.Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96:2907–2912, 1999.
[264] S., Tavazoie, D., Hughes, M. J., Campbell, R. J., Cho, and G. M., Church. Systematic determination of genetic network architecture. Nature Genetics, 22:281–285, 1999.
[265] D. M. J., Tax and R. P., W. Duin. Data domain description using support vectors. In M., Verleysen (Editor), Proceedings of the European Symposium on Artificial Neural Networks, ESANN '99, Brussels, pages 251–256, 1999.
[266] D. M. J., Tax and R. P., W. Duin. Support vector domain description. Pattern Recognition Lett., 20:1191–1199, 1999.
[267] S., Theodoridis and K., Koutroumbas. Pattern Recognition, 4th edition. New York: Academic Press, 2008.
[268] R., Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal Statist. Soc. B, 58:267–288, 1996.
[269] F., Tobar, D., Mandic, and S. Y., Kung. The multikernel least mean square algorithm. IEEE Trans. Neural Networks Learning Systems, 99 accepted for publication. 2013.
[270] C., Tong, V., Svetnik, B., Schölkopfet al.Novelty detection in mass spectral data using a support vector machine method. In Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, 2000.
[271] S., Tong and D., Koller. Support vector machine active learning with applications to text classification. J. Machine Learning Res., 2:45–66, 2002.
[272] L. N., Trefethen and D, Bau III.Numerical Linear Algebra.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1997.
[273] G., Tsoumakas and I., Katakis. Multi-label classification: An overview. Int. J. Data Warehousing Mining, 3:1–13, 2007.
[274] G., Tsoumakas, I., Katakis, and I., Vlahavas. Mining multi-label data. In O., Maimon and L., Rokach (Editors), Data Mining and Knowledge Discovery Handbook, 2nd edition. Berlin: Springer, 2010.
[275] A. N., Tychonoff. On the stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):195–198, 1943.
[276] I., Van Mechelen, H. H., Bock, and P., De Boeck. Two-mode clustering methods: A structured overview. Statist. Methods Med. Res., 13(5):363–394, 2004.
[277] L., J. van't Veer, Hongyue Dai, M. J., van de Vijveret al.Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.
[278] V., Vapnik. Estimation of dependences based on empirical data [in Russian]. Moscow, Nauka, 1979. (English translation New York: Springer, 1982.)
[279] V., Vapnik, S., Golowich, and A., Smola. Support vector method for function approximation, regression estimation, and signal processing. In M., Mozer, M., Jordan, and T., Petsche (editors), Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT Press, pages 281–287, 1997.
[280] V. N., Vapnik. The Nature of Statistical Learning Theory.New York: Springer-Verlag, 1995.
[281] V. N., Vapnik. Statistical Learning Theory.New York: Wiley, 1998.
[282] C., Vens, J., Struyf, L., Schietgat, S., Dzeroski, and H., Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 2(73):185–214, 2008.
[283] N., Villa and F., Rossi. A comparison between dissimilarity SOM and kernel SOM clustering the vertices of a graph. In Proceedings of the 6th International Workshop on Self-Organizing Maps.Bielefeld: Bielefeld University, 2007.
[284] G., Wahba. Spline Models for Observational Data.Philadelphia, PA: SIAM, 1990.
[285] Shibiao, Wan, Man-Wai, Mak, and S. Y., Kung. mGOASVM: Multi-label protein sub- cellular localization based on gene ontology and support vector machines. BMC Bioinformatics, 13:290, 2012 (available at http://link.springer.com/article/10.1186/1471-2105-13-290/fulltext.html).
[286] Shibiao, Wan, Man-Wai, Mak, and S. Y., Kung. Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction. In Proceedings of ICASSP '13, Vancouver, pages 3547–3551, 2013.
[287] Jeen-Shing, Wang and Jen-Chieh, Chiang. A cluster validity measure with outlier detection for support vector clustering. IEEE Trans. Systems, Man, Cybernet. B, 38:78–89, 2008.
[288] L., Wang, J., Zhu, and H., Zou. The doubly regularized support vector machine. Statist. Sinica, 16:589–615, 2006.
[289] Y., Wang, A., Reibman, F., Juang, T., Chen, and S. Y., Kung. In Proceedings of the IEEE Workshops on Multimedia Signal Processing.Princeton, MA: IEEE Press, 1997.
[290] Z., Wang, S. Y., Kung, J., Zhanget al.Computational intelligence approach for gene expression data mining and classification. In Proceedings of the IEEE International Conference on Multimedia & Expo.Princeton, MA: IEEE Press, 2003.
[291] Z., Wang, Y., Wang, J., Luet al. Discriminatory mining of gene expression microarray data. J. Signal Processing Systems, 35:255–272, 2003.
[292] J. H., Ward. Hierarchical grouping to optimize an objective function. J. Am. Statist. Assoc., 58:236–244, 1963.
[293] G. S., Watson. Smooth regression analysis. Sankhya: Indian J. Statist. Ser. A, 26:359–372, 1964.
[294] P. J., Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavior Science. PhD thesis, Harvard University, Cambridge, MA, 1974.
[295] J., Weston, A., Elisseeff, G., BakIr, and F., Sinz. http://www.kyb.tuebingen.mpg.de/bs/people/spider/main.html.
[296] J., Weston, A., Elisseeff, B., Schölkopf, and M., Tipping. Use of the zero-norm with linear models and kernel methods. J. Machine Learning Res., 3:1439–1461, 2003.
[297] J., Weston and C., Watkins. Multi-class support vector machines. In Proceedings of ESANN, Brussels, 1999.
[298] B., Widrow and S. D., Stern. Adaptive Signal Processing.Englewood Cliffs, NJ: Prentice Hall, 1984.
[299] N., Wiener. Interpolation and Smoothing of Stationary Time Series.Cambridge, MA: MIT Press, 1949.
[299b] S., Winters-Hilt, A., Yelundur, C., McChesney, and M., Landry. Support vector machine implementations for classification clustering. BMC Bioinformatics, 7:S4, published online, 2006.
[300] L., Wolf and A., Shashua. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J. Machine Learning Res., 6:1855–1887, 2005.
[301] M. A., Woodbury. Inverting modified matrices. In Statistical Research Group Memorandum Report 42, MR38136, Princeton University, Princeton, NJ, 1950.
[302] Peiyuan, Wu and S. Y., Kung. Kernel-induced optimal linear estimators and generalized Gauss-Markov theorems. Submitted 2013.
[302b] Peiyuan, Wu, C. C., Fang, J. M., Chang, S., Gilbert, and S. Y., Kung. Cost-effective kernel ridge regression implementation for keystroke-based active authentication system. In Proceedings of ICASSP '14, Florence, Italy, 2014.
[303] Z. D., Wu, W. X., Xie, and J. P., Yu. Fuzzy c-means clustering algorithm based on kernel method. In Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, pages 49–54, 2003.
[304] X., Xiao, Z. C., Wu, and K. C., Chou. iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J.|Theor. Biol., 284:42–51, 2011.
[305] E. P., Xing and R. M., Karp. CLIFF: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(90001):306–315, 2001.
[306] H., Xu, C., Caramanis, and S., Mannor. Robust regression and LASSO. In Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press, pages 1801–1808, 2009.
[307] Rui, Xu and D., WunschII. Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645–678, 2005.
[308] L., Yan, R., Dodier, M. C., Mozer, and Wolniewicz, R.Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistics. In Proceedings of the International Conference on Machine Learning, pages 848–855, 2003.
[309] Haiqin, Yang, Kaizhu, Huang, Laiwan, Chan, I., King, and M. R., Lyu. Outliers treatment in support vector regression for financial time series prediction. In ICONIP '04, pages 1260–1265, 2004.
[310] Yinan, Yu, K., Diamantaras, T., McKelvey, and S. Y., Kung. Ridge-adjusted slack variable optimization for supervised classification. In IEEE International Workshop on Machine Learning for Signal Processing, Southampton, 2013.
[311] Yinan, Yu, T., McKelvey, and S. Y., Kung. A classification scheme for “high-dimensional-small-sample-size” data using SODA and ridge-SVM with medical applications. In Proceedings, 2013 International Conference on Acoustics, Speech, and Signal Processing, 2013.
[311b] Yinan, Yu, T., McKelvey, and S. Y., Kung. Kernel SODA: A feature reduction technique using kernel based analysis. In Proceedings, 12th International Conference on Machine Learning and Applications (ICMLA '13), volume 4B, page 340.
[312] C.-H., Yuan, G.-X., Ho and C.-J., Lin. Recent advances of large-scale linear classification. Proc. IEEE, 100:2584–2603, 2012.
[313] M., Yukawa. Multikernel adaptive filtering. IEEE Trans. Signal Processing, 60(9):4672–4682, 2012.
[314] A. L., Yullie and N. M., Grzywacz. The motion coherence theory. In Proceedings, International Conference on Computer Vision, pages 344–353, 1988.
[315] D.-Q., Zhang and S.-C., Chen. A novel kernelized fuzzy c-means algorithm with application in medical image segmentation. Artif. Intell. Med., 32:37–50, 2004.
[316] Lei, Zhang, Fuzong, Lin, and Bo, Zhang. Support vector machine learning for image retrieval. In Proceedings of the 2001 International Conference on Image Processing, volume 2, pages 721–724, 2001.
[317] Zhang, M. Q.. Computational prediction of eukaryotic protein-coding genes. Nature Rev. Genetics, 3(9):698–709, 2002.
[318] X. G., Zhang, X., Lu, Q., Shiet al.Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(197), 2006.
[319] Z., Zhang, X. D., Gu, and S. Y., Kung. Color-frequency-orientation histogram based image retrieval. In Proceedings of ICASSP, Kyoto, 2012.
[319b] Zhang, Z., G., Page, and H., Zhang. Applying classification separability analysis to microarray data. In S. M., Lin and K. F., Johnson, editors, Methods of Microarray Data Analysis, CAMDA '00.Boston, MA: Kluwer Academic Publishers, pages 125–136, 2001.
[320] J., Zhu, S., Rosset, T., Hastie, and R., Tibshirani. 1-norm SVMS. In Advances in Neural Information Processing Systems 16.Cambridge, MA: MIT press, 2004.
[321] Zou, H. and T., Hastie. Regularization and variable selection via the elastic net. J. Royal Statist. Soc., Ser. B, 67(2):301–320, 2005.