Skip to main content Accessibility help
Hostname: page-component-684899dbb8-662rr Total loading time: 0.861 Render date: 2022-05-22T05:19:54.623Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }


Published online by Cambridge University Press:  05 July 2014

S. Y. Kung
Princeton University, New Jersey
Get access


Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


[1] C. C., Aggarwal and P. S., Yu. Outlier detection for high dimensional data. In Proceedings of ACM SIGMOD, pages 37–46, 2001.
[2] A. C., Aitken. On least squares and linear combinations of observations. Proc. Royal Soc. Edinburgh, 55:42–18, 1935.Google Scholar
[3] H., Akaike. A new look at the statistical model identification. IEEE Trans. Automatic Control, 19(6):716–723, 1974.Google Scholar
[4] A. A., Alizadeh, M. B., Eisen, R. E., Daviset al.Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000.Google Scholar
[5] U., Alon, N., Barkai, D. A., Nottermanet al.Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA, 96(12):6745, 1999.Google Scholar
[6] P., Andras. Kernel-Kohonen networks. Int. J. Neural Systems, 12;117–135, 2002.
[7] S. A., Armstrong, J. E., Staunton, L. B., Silvermanet al.MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41–47, 2002.Google Scholar
[8] N., Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., 68:337–404, 1950.Google Scholar
[9] P., Baldi and S., Brunak. Bioinformatics: The Machine Learning Approach, 2nd edition. Cambridge, MA: MIT Press, 2001.
[10] G., Baudat and F., Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12:2385–2404, 2000.Google Scholar
[11] D. G., Beer, S. L., R. Kardia, C.-C. Huanget al.Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med., 8:816–824, 2002.Google Scholar
[12] R., Bellman. Dynamic Programming.Princeton, NJ: Princeton University Press, 1957.
[13] R., Bellman. Adaptive Control Processes: A Guided Tour.Princeton, NJ: Princeton University Press, 1961.
[14] Ben-David, S. and Lindenbaum, M.. Learning distributions by their density levels: A paradigm for learning without a teacher. J. Computer System Sci., 55:171–182, 1997.Google Scholar
[15] A., Ben-Dor, L., Bruhn, N., Friedmanet al.Tissue classification with gene expression profiles. J. Computai. Biol., 7:559–583, 2000.Google Scholar
[16] A., Ben-Hur, D., Horn, H., Siegelmann, and V., Vapnik. A support vector method for hier¬archical clustering. In T. K., Leen, T. G., Dietterich, and V., Tresp, editors, Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press.
[17] D. P., Bertsekas. Nonlinear Programming.Belmont, MA: Athena Scientific, 1995.
[18] C., Bhattacharyya. Robust classification of noisy data using second order cone program¬ming approach. In Proceedings, International Conference on Intelligent Sensing and Information Processing, pages 433–438, 2004.
[19] J., Bibby and H., Toutenburg. Prediction and Improved Estimation in Linear Models. New York: Wiley, 1977.
[20] C. M., Bishop. Neural Networks for Pattern Recognition.Oxford: Oxford University Press, 1995.
[21] C. M., Bishop. Training with noise is equivalent to Tikhonov regularization. Neural Comput., 7:108–116, 1995.Google Scholar
[22] C. M., Bishop. Pattern Recognition and Machine Learning.Berlin: Springer, 2006.
[23] K. L., Blackmore, R. C., Williamson, I. M., Mareels, and W. A., Sethares. Online learning via congregational gradient descent. In Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT '95).New York: ACM Press, pages 265–272, 1995.
[24] B. E., Boser, I. M., Guyon, and V. N., Vapnik. A training algorithm for optimal margin classifiers. In D., Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992.
[25] R., Boulet, B., Jouve, F., Rossi, and N., Villa. Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7-9):1257–1273, 2008.Google Scholar
[26] M., Boutell, J., Luo, X., Shen, and C., Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.Google Scholar
[27] P. S., Bradley and O. L., Mangasarian. Feature selection via concave minimization and support vector machines. In International Conference on Machine Learning, pages 82–90, 1998.
[28] U. M., Braga-Neto and E. R., Dougherty. Is cross-validation valid for small-sample microarray classification?Bioinformatics, 20(3):378–380, 2004.Google Scholar
[29] M. P. S., Brown, W. N., Grundy, D., Linet al.Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. USA, 97(1):262–267, 2000.Google Scholar
[30] C.J.C., Burges. A tutorial on support vector machines for pattern recognition. Knowledge Discovery Data Mining, 2(2):121–167, 1998.Google Scholar
[31] F., Camastra and A., Verri. A novel kernel method for clustering. IEEE Trans. Pattern Anal. Machine Intell., 27(5):801–804, 2005.Google Scholar
[32] C., Campbell and K. P., Bennett. A linear programming approach to novelty detection. In Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, 2001.
[33] H., Cao, T., Naito, and Y., Ninomiya. Approximate RBF kernel SVM and its applications in pedestrian classification. In Proceedings, The 1st International Workshop on Machine Learning for Vision-based Motion Analysis - MLVMA08, 2008.
[33b] J. Morris, Chang, C. C., Fang, K. H., Hoet al.Capturing cognitive fingerprints from keystroke dynamics. IT Professional, 15(4):24–28, 2013.Google Scholar
[34] Chih-Chung, Chang and Chih-Jen, Lin. LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Technol., 2(27):1–27, 2011. Software available at Scholar
[35] O., Chapelle, P., Haffner, and V. N., Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks, 10:1055–1064, 1999.Google Scholar
[36] O., Chapelle, V., Vapnik, O., Bousquet, and S., Mukhejee. Choosing kernel parameters for support vector machines. In Machine Learning, 46:131–159, 2002.Google Scholar
[37] P. H., Chen, C. J., Lin, and B., Schölkopf. A tutorial on v-support vector machines, 2003 (
[38] Y., Cheng and G. M., Church. Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), volume 8, pages 93–103, 2000.
[39] V., Chercassky and P., Mullier. Learning from Data, Concepts, Theory and Methods. New York: John Wiley, 1998.
[40] A., Clare and R. D., King. Knowledge discovery in multi-label phenotype data. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 42–53, 2001.
[41] C., Cortes and V., Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.Google Scholar
[42] R., Courant and D., Hilbert. Methods of Mathematical Physics.New York: Interscience, 1953.
[43] R., Courant and D., Hilbert. Methods of Mathematical Physics, volumes I and II. New York: Wiley Interscience, 1970.
[44] T. M., Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Computers, 14:326–334, 1965.Google Scholar
[44b] T. F., Cox and M. A., A. Cox. Multidimensional Scaling.London: Chapman and Hall, 1994.
[45] Data set provider.
[46] P., de Chazal, M., O'Dwyer, and R. B., ReillyAutomatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng., 51(7):1196–1206, 2004.Google Scholar
[47] A. P., Dempster, N. M., Laird, and D. B., Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc., Ser. B, 39(1):1–38, 1977.Google Scholar
[48] I. S., Dhillon, Y., Guan, and B., Kulis. Kernel K-means, spectral clustering and normalized cuts. In Proceedings of the 10th ACM KDD Conference, Seattle, WA, 2004.
[49] K., Diamantaras and M., Kotti. Binary classification by minimizing the mean squared slack. In Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP-2012), Kyoto, pages 2057–2060, 2012.
[50] K. I., Diamantaras and S. Y., Kung. Principal Component Neural Networks. New York: Wiley, 1996.
[51] T. G., Dietterich and G., Bakiri. Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., 2:263–286, 1995.Google Scholar
[52] H., Drucker, C. J. C., Burges, L., Kaufman, Smola, A., and V., Vapnik. Support vector regression machines. In Advances in Neural Information Processing Systems (NIPS '96), Volume 9. Cambridge, MA: MIT Press, pages 155–161, 1997.
[53] R. O., Duda and P. E., Hart. Pattern Classification and Scene Analysis.New York: Wiley 1973.
[54] R. O., Duda, P. E., Hart, and D.G., Stork. Pattern Classification, 2nd edition. New York: Wiley, 2011.
[55] S., Dudoit, J., Fridlyand, and T. P., Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576, Department of Statistics, University of California, Berkeley, CA, 2000.
[56] S., Dudoit, J., Fridlyand, and T. P., Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Statist. Assoc., 97:77–88, 2002.Google Scholar
[57] R., Durbin and D. J., Willshaw. An analogue approach to the travelling salesman problem using an elastic net method. Nature, 326:689–691, 1987.Google Scholar
[58] B., Efron. Bootstrap methods: Another look at the jackknife. Ann. Statist., 7:1–26, 1979.Google Scholar
[59] B., Efron. The Jackknife, the Bootstrap and Other Resampling Plans.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1982.
[60] B., Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Statist. Assoc., 78:316–331, 1983.Google Scholar
[61] M. B., Eisen, P. T., Spellman, P. O., Brown, and D., Botstein. Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sci. USA, 95:14863–14868, 1998.Google Scholar
[62] Y., Engel, S., Mannor, and R., Meir. The kernel recursive least-squares algorithm. IEEE Trans. Signal Processing, 52(8):2275–2285, 2004.Google Scholar
[63] B., Schölkopf, C. J. C., Burges, and A. J., Smola (editors). Advances in Kernel Methods -Support Vector Learning.Cambridge, MA: MIT Press, 1999.
[64] M., Aizerman, E. A., Braverman, and L., Rozonoer. Theoretical foundation of the potential function method in pattern recognition learning. Automation Remote Control, 25:821–837, 1964.Google Scholar
[65] T., Graepel, R., Herbrich, P., Bollman-Sdorra, and K., Obermayer. Classification on pairwise proximity data. Advances in Neural Information Processing Systems 11.Cambridge, MA: MIT Press, pages 438–444, 1999.
[66] T. R., Golub, D. K., Slonim, P., Tamayoet al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.Google Scholar
[67] M., Filippone, F., Camastra, F., Masulli, and S., Rosetta. A survey of kernel and spectral methods for clustering. Pattern Recognition, 41:176–190, 2008.Google Scholar
[68] R. A., Fisher. The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7:179–188, 1936.Google Scholar
[69] R., Fletcher. Practical Methods of Optimization, 2nd edition. New York: Wiley, 1987.
[70] E. W., Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21:768–769, 1965.Google Scholar
[71] R. J., Fox and M. W., Dimmic. A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7(1):126, 2006.Google Scholar
[72] Y., Freund and R., Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.Google Scholar
[73] K., Fukunaga. Introduction to Statistical Pattern Recognition, 2nd edition. Amsterdam: Elsevier, 1990.
[73b] G., Fung and O. L., Mangasarian. Proximal support vector machine classifiers. In Proceedings, ACMKDD01, San Francisco, 2001.
[74] T. S., Furey, N., Cristianini, N., Duffyet al.Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10):906–914, 2000.Google Scholar
[75] I., Gat-Viks, R., Sharan, and R., Shamir. Scoring clustering solutions by their biological relevance. Bioinformatics, 19(18):2381–2389, 2003.Google Scholar
[76] T. V., Gestel, J. A. K., Suykens, G., Lanckrietet al.Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Comput., 14(5):1115–1147, 2002.Google Scholar
[77] F. D., Gibbons and F. P., Roth. Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res., 12:1574–1581, 2002.Google Scholar
[78] M., Girolami. Mercer kernel based clustering in feature space. IEEE Trans. Neural Networks, 13(3):780–784, 2002.Google Scholar
[79] G., Golub andLoan, C. F. Van. Matrix Computations, 3rd edition. Battimore, MD: Johns Hopkins University Press, 1996.
[80] G. H., Golub and W., Kahan. Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Industrial Appl. Math.: Ser. B, Numerical Anal., 2(2):205–224, 1965.Google Scholar
[81] G., Golub and C., van Loan. An analysis of the total least squares problem. SIAM J. Numerical Anal., 17:883–893, 1980.Google Scholar
[82] T. R., Golub, D. K., Slonim, C., Huardet al.Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.Google Scholar
[83] G., Goodwin and K., Sin. Adaptive Filtering: Prediction and Control.Englewood Cliffs, NJ: Prentice Hall, 1984.
[84] C., Goutte. Note on free lunches and cross-validation. Neural Comput., 9:1211–1215, 1997.Google Scholar
[85] T., Graepel and K., Obermayer. Fuzzy topographic kernel clustering. In W., Brauer, editor, Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems, pages 90–97, 1998.
[86] Z., Griliches and V., Ringstad. Errors-in-the-variables bias in nonlinear contexts. Econometrica, 38(2):368–370, 1970.Google Scholar
[87] S. R., Gunn. Support vector machines for classification and regression. USC-ISIS Technical ISIS Technical Report, 1998.
[88] J., Guo, M. W., Mak, and S. Y., Kung. Eukaryotic protein subcellular localization based on local pairwise profile alignment SVM. Proceedings, 2006 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '06), pages 416–422, 2006.
[89] I., Guyon, J., Weston, S., Barnhill, and V., Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002.Google Scholar
[90] B. H., Juang, S. Y., Kung, and Kamm, C. A. (Editors). Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, Princeton, NJ, 1991.
[91] B., Hammer, A., Hasenfuss, F., Rossi, and M., Strickert. Topographic processing of relational data. In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefeld, 2007.
[92] J. A., Hartigan. Direct clustering of a data matrix. J. Am. Statist. Assoc., 67(337):123–129, 1972.Google Scholar
[93] T., Hastie, R., Tibshirani, M., Eisenet al.“Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol., 1(2):research0003.1-research0003.21, 2000.
[94] S., Haykin. Adaptive Filter Theory, 3rd edition. Englewood Cliffs, NJ: Prentice Hall, 1996.
[95] S., Haykin. Neural Networks: A Comprehensive Foundation, 2nd edition. Englewood Cliffs, NJ: Prentice Hall, 2004.
[96] X., He, D., Cai, and P., Niyogi. Laplacian score for feature selection. In Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press, 2005.
[97] M. A., Hearst, B., Schölkopf, S., Dumais, E., Osuna, and J., Platt. Trends and controversies -support vector machines. IEEE Intelligent Systems, 13:18–28, 1998.Google Scholar
[98] V. J., Hodge and J., Austin. A survey of outlier detection methodologies. Intell. Rev., 22:85–126, 2004.Google Scholar
[99] A. E., Hoerl and R. W., Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1):80–86, 1970.Google Scholar
[99b] T., Hofmann, B., Schölkopf, and A. J., Smola. Kernel methods in machine learning. Ann. Statist., 36(3):1171–1220, 2008.Google Scholar
[100] H., Hotelling. Analysis of a complex of statistical variables into principal components. J. Educational Psychol., 24:498–520, 1933.Google Scholar
[101] C. W., Hsu and C. J., Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks, 13(2):415–425, 2002.Google Scholar
[102] D., Hsu, S. M., Kakade, J., Langford, and T., Zhang. Multi-label prediction via compressed sensing). In Advances in Neural Information Processing Systems 22, Cambridge, MA: MIT Press, pages 772–780, 2009.
[103] S., Hua and Z., Sun. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. J. Molec. Biol., 308(2):397–W7, 2001.Google Scholar
[104] Y., Huang and Y. D., Li. Prediction of protein subcellular locations using fuzzy K-NN method. Bioinformatics, 20(1):21–28, 2004.Google Scholar
[105] P. J., Huber. Robust statistics: A review. Ann. Math. Statist., 43:1041–1067, 1972.Google Scholar
[106] P. J., Huber. Robust Statistics.New York: John Wiley and Sons, 1981.
[107] N., Iizuka, M., Oka, H., Yamada-Okabeet al.Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Lancet, 361(9361):923–929, 2003.Google Scholar
[108] R., Inokuchi and S., Miyamoto. LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, Volume 3, pages 1497–1500, 2004.Google Scholar
[109] L. B., Jack and A. K., Nandi. Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mechanical Systems Signal Processing, 16:373–390, 2002.Google Scholar
[110] P., Jafari and F., Azuaje. An assessment of recently published gene expression data analyses: Reporting experimental design and statistical factors. BMC Med. Inform., 6:27, 2006.Google Scholar
[111] A. K., Jain, M. N., Murty, and P. J., Flynn. Data clustering: A review. ACM Comput. Surveys, 31(3):264–323, 1999.Google Scholar
[112] W., James and C., Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, Volume 1. Berkeley, CA: University of California Press, pages 361–380, 1960.
[113] T., Joachims. Text categorization with support vector machines: Learning with many relevant features. Proceedings of European Conference on Machine Learning, Berlin: Springer, pages 137–142, 1997.
[114] T., Joachims. Making large-scale SVM learning practical. In B., Schölkopf, C., Burges, and A., Smola, editors, Advances in Kernel Methods - Support Vector Learning.Cambridge, MA: MIT Press, 1999.
[115] I. T., Jolliffe. Principal Component Analysis, 2nd edition. New York: Springer, 2002.
[116] E. M., Jordaan and G. F., Smits. Robust outlier detection using SVM regression. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, volume 3, pages 2017–2022, 2004.
[117] M. I., Jordan and C. M., Bishop. An Introduction to Probabilistic Graphical Models.Cambridge, MA: MIT Press, 2002.
[118] B. H., Juang, S. Y., Kung, and C. A., Kamm. IEEE Workshops on Neural Networks for Signal Processing.New York: IEEE Press, 1991.
[119] T., Kailath. Linear Systems.Englewood Cliffs, NJ: Prentice Hall, 1980.
[120] T., Kailath, A. H., Sayed, and B., Hassibi. Linear Esitmation.Englewood Cliffs, NJ: Prentice Hall, 2000.
[121] M., Kass, A., Witkin, and D., Terzopoulos. Snakes: Active contour models. Int. J. Computer Vision, 1:321–331, 1987.Google Scholar
[122] S. S., Keerthi, S. K., Shevade, C., Bhattacharyya, and K. R., K. Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13:637–649, 2001.Google Scholar
[123] J., Khan, J. S., Wei, M., Ringneret al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6):673–679, 2001.Google Scholar
[124] J., Khan, J. S., Wei, M., Ringneret al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7:673–679, 2001.Google Scholar
[125] K. I., Kim, K., Jung, S. H., Park, and H. J., Kim. Texture classification with kernel principal component analysis. Electron. Lett., 36(12):1021–1022, 2000.Google Scholar
[126] G. S., Kimeldarf and G., Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applications, 33:82–95, 1971.Google Scholar
[127] S., Kirkpatrick, C. D., Gelat, and M. P., Vecchi. Optimization by simulated annealing. Science, 220:671–680, 1983.Google Scholar
[128] R., Kohavi and G. H., John. Wrappers for feature selection. Artif. Intell., 97(1-2):273–324, 1997.Google Scholar
[129] T., Kohonen. Self-organized formation of topologically correct feature map. Biol. Cybernet., 43:59–69, 1982.Google Scholar
[130] T., Kohonen. Self-Organization and Associative Memory.New York: Springer, 1984.
[131] T., Kohonen. Self-Organizing Maps, 2nd edition. Berlin: Springer, 1997.
[132] T., Kudo and Y., Matsumoto. Chunking with support vector machines. In Proceedings, North American Chapter of the Association for Computational Linguistics, 2001.
[133] H. T., Kung. Why systolic architectures?IEEE Computer, 15(1):37–46, 1982.Google Scholar
[134] S. Y., Kung. VLSI Array Processors.Englewood Cliffs, NJ: Prentice Hall, 1988.
[135] S. Y., Kung. Digital Neural Networks.Englewood Cliffs, NJ: Prentice Hall, 1993.
[136] S. Y., Kung. Kernel approaches to unsupervised and supervised machine learning. In Proceedings of PCM 2009, Bangkok, pages 1–32. Berlin: Springer-Verlag, 2009.
[137] S. Y., Kung, K. I., Diamantaras, and J. S., Taur. Adaptive principal component extraction (APEX) and applications. IEEE Trans. Signal Processing, 42(5):1202–1217, 1994.Google Scholar
[138] S. Y., Kung, F., Fallside, J. A., Sorensen, and C. A., Kamm (Editors). Neural Networks for Signal Processing II.Piscataway, NJ: IEEE, 1992.
[139] S. Y., Kung and Man-Wai, Mak. PDA-SVM hybrid: A unified model for kernel-based supervised classification. J. Signal Processing Systems, 65(1):5–21, 2011.Google Scholar
[140] S. Y., Kung and M. W., Mak. Feature selection for self-supervised classification with applications to microarray and sequence data. IEEE J. Selected Topics Signal Processing: Special Issue Genomic and Proteomic Signal Processing, 2(3):297–309, 2008.Google Scholar
[141] S. Y., Kung, M. W., Mak, and S. H., Lin. Biometric Authentication: A Machine Learning Approach.Upper Saddle River, NJ: Prentice Hall, 2005.
[142] S. Y., Kung, M. W., Mak, and I., Tagkopoulos. Multi-metric and multi-substructure biclus- tering analysis for gene expression data. In IEEE Computational Systems Bioinformatics Conference, Stanford, CA, 2005.
[143] S. Y., Kung, M. W., Mak, and I., Tagkopoulos. Symmetric and asymmetric multi- modality biclustering analysis for microarray data matrix. J. Bioinformatics Comput. Biol., 4(3):275–298, 2006.Google Scholar
[144] S. Y., Kung and Peiyuan, Wu. On efficient learning and classification kernel methods. In Proceedings, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12), Kyoto, 2012.
[145] S. Y., Kung and Peiyuan, Wu. Perturbation regulated kernel regressors for supervised machine learning. In Proceedings, 2012 IEEE International Workshop on Machine Learning for Signal Processing (MLSP '12), 2012.
[146] S. Y., Kung and Yuhui, Luo. Recursive kernel trickfor networksegmentation. Int. J. Robust Nonlinear Control, 21(15):1807–1822, 2011.Google Scholar
[147] N., Lawrence and B., Schölkopf. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the 18th International Conference on Machine Learning, San Francisco. New York: Morgan Kaufman, 2001.
[148] L., Lazzeroni and A. B., Owen. Plaid models for gene expression data. Technical report, 03, 2000 (
[149] K. H., Lee, S. Y., Kung, and N., Verma. Improving kernel-energy trade-offs for machine learning in implantable and wearable biomedical applications. In Proceedings of ICASSP, pages 1597–1600, 2011.
[150] K. H., Lee, S. Y., Kung, and N., Verma. Low-energy formulations of support vector machine kernel functions for biomedical sensor applications. Journal of Signal Processing Systems, Berlin: Springer, published online, 2012.
[151] B., Levy. Principles of Signal Detection and Parameter Estimation.Berlin: Springer, 2008.
[152] Dan, Li. Performance Evaluation of Maximum Log-Likelihood Classification, ELE 571 Course Project Report, Princeton University, Princeton, NJ, 2010.
[153] L. Q., Li, Y., Zhang, L. Y., Zou, Y., Zhou, and X. Q., Zheng. Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition. Protein Peptide Lett., 19:375–387, 2012.Google Scholar
[154] T., Li and M., Ogihara. Toward intelligent music information retrieval. IEEE Trans. Multimedia, 8(3):564–574, 2006.Google Scholar
[155] Y., Liang, H., Wu, R., Leiet al.Transcriptional network analysis identifies BACH1 as a master regulator of breast cancer bone metastasis. J. Biol. Chem., 287(40):33533–33544, 2012.Google Scholar
[156] Opensource EEG libraries and toolkit.
[157] W., Liu, J. C., Principe, and S., Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction.New York: Wiley, 2010.
[158] L., Ljung. System Identification: Theory for the User.Englewood Cliffs, NJ: Prentice Hall, 1999.
[159] Z.-P., Lo, M., Fujita, and B., Bavarian. Analysis of neighborhood interaction in Kohonen neural networks. In Proceedings, 6th International Parallel Processing Symposium, Los Alamitos, CA, pages 247–249, 1991.
[160] Z.-P., Lo, Y., Yu, and B., Bavarian. Analysis of the convergence properties of topology preserving neural networks. IEEE Trans. Neural Networks, 4:207–220, 1993.Google Scholar
[161] G., Lubec, L, Afjehi-Sadat, J. W., Yang, and J. P., John. Searching for hypothetical proteins: Theory and practice based upon original data and literature. Prog. Neurobiol., 77:90–127, 2005.Google Scholar
[162] S., Ma and J., Huang. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics, 21(24):4356–4362, 2005.Google Scholar
[163] D., MacDonald and C., Fyfe. The kernel self organising map. In Proceedings of 4th International Conference on Knowledge-Based Intelligence Engineering Systems and Applied Technologies, 2000.
[164] M., MacQueen. Some methods for classification and analysis of multivariate observation. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Proba¬bilities (L. M., LeCun and J., Neyman, editors, volume 1, pages 281–297. Berkeley, CA: University of California Press, 1967.
[165] P. C., Mahalanobis. On the generalised distance in statistics. J. Proc. Asiatic Soc. Bengal, 2:49–55, 1936.Google Scholar
[166] M. W., Mak, J., Guo, and S. Y., Kung. PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 5(3):416–422, 2008.Google Scholar
[167] M. W., Mak and S. Y., Kung. A solution to the curse of dimensionality problem in pairwise scoring techniques. In International Conference on Neural Information Processing, pages 314–323, 2006.
[168] M. W., Mak and S. Y., Kung. Fusion of feature selection methods for pairwise scoring SVM. Neurocomputing, 71(16-18):3104–3113, 2008.Google Scholar
[169] M. W., Mak and S. Y., Kung. Low-power SVM classifiers for sound event classification on mobile devices. In Proceedings of ICASSP, Kyoto, 2012.
[170] O. L., Mangasarian, G., Fung, and J. W., Shavlik. Knowledge-based nonlinear kernel clas- sifers. In Learning Theory and Kernel Machines.Berlin: Springer-Verlag, pages 102–113, 2003.
[171] H. B., Mann and D. R., Whitney. On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Statist., 18:50–60, 1947.Google Scholar
[172] Mathworks-SVM. Mathworks bioinformatics toolbox.
[173] M., Mavroforakis and S., Theodoridis. A geometric approach to support vector machine (SVM) classification. IEEE Trans. Neural Networks, 17(3):671–682, 2006.Google Scholar
[174] G. J., McLachlan. Discriminant Analysis and Statistical Pattern Recognition. New York: John Wiley & Sons, 1992.
[175] J., Mercer. Functions of positive and negative type, and their connection with the theory of integral equations. Trans. London Phil. Soc., A209:415–446, 1909.Google Scholar
[176] S., Mika. Kernel Fisher Discriminants. PhD thesis, The Technical University of Berlin, Berlin, 2002.
[177] S., Mika, G., Ratsch, and K. R., Muller. A mathematical programming approach to the kernel Fisher algorithm. In Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT press, pages 591–597, 2001.
[178] S., Mika, G., Ratsch, J., Weston, B., Schölkopf, and K. R., Mullers. Fisher discriminant analysis with kernels. In Y. H., Hu, J., Larsen, E., Wilson, and S., Douglas, editors, Neural Networks for Signal Processing IX, pages 41–48, 1999.
[179] S., Mika, A.J., Smola, and B., Schölkopf. An improved training algorithm for kernel Fisher discriminants. In T., Jaakkola and T., Richardson, editors, Proceedings AISTATS, San Francisco, CA, pages 98–104. New York: Morgan Kaufmann, 2001.
[180] B., Mirkin. Mathematical Classification and Clustering.Berlin: Springer, 1996.
[181] T. M., Mitchell. Machine Learning.New York: McGraw-Hill, 1997.
[182] P. J., Moreno, P. P., Ho, and N., Vasconcelos. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Technical Report, HP Laboratories Cambridge, 2004.
[183] MSPsim.
[184] S., Mukherjee, E., Osuna, and F., Girosi. Nonlinear prediction of chaotic time series using support vector machines. In J., Principe, L., Giles, N., Morgan, and E., Wilson, editors, Proceedings, IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, FL, pages 276–285, 1997.
[185] K. R., Muller, S., Mika, G., Ratsch, K., Tsuda, and B., Schölkopf. An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001.Google Scholar
[186] K. R., Muller, A., Smola, G., Ratschet al.Predicting time series with support vector machines. In Proceedings, International Conference on Artificial Neural Networks, London: Springer-Verlag, pages 999–1004, 1997.
[187] K. R., Muller, S., Mika, G., Ratsch, K., Tsuda, and B., Schölkopf. An introduction to kernelbased learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001.Google Scholar
[188] N., Murata, K. R., Muller, A., Ziehe, and S., Amari. Adaptive on-line learning in changing environments. In M. C., Mozer, M. I., Jordan, and T, Petsche, editors, Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT press, pages 599–605, 1997.
[189] C. L., Myers. Context-sensitive methods for learning from genomic data. Thesis, Department of Electrical Engineering, Princeton University, Princeton, NJ, 2007.
[190] C.L., Myers, M., Dunham, S.Y., Kung, and O., Troyanskaya. Accurate detection of aneu-ploidies in array cgh and gene expression microarray data. In Bioinfomotics. Published online, Oxford University Press, 2005.
[191] E. A., Nadaraya. On estimating regression. Theory Probability Applicationss, 9:141–142, 1964.Google Scholar
[192] Neural network frequently asked questions.
[193] J., Neyman and E. S., Pearson. On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20:175–240, 1928.Google Scholar
[194] S., Niijima and Y., Okuno. Laplacian linear discriminant analysis approach to unsupervised feature selection. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 6(4):605–614, 2009.Google Scholar
[195] C. L., Nutt, D. R., Mani, R. A., Betenskyet al.Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res., 63(7):1602–1607, 2003.Google Scholar
[196] E., Oja. A simplified neuron model as a principal component analyzer. J. Math. Biol., 15:267–273, 1982.Google Scholar
[197] E., Osuna, R., Freund, and E., Girosi. An improved training algorithm for support vector machines. In J., Principe, L., Giles, N., Morgan, and E., Wilson, Editors, Proceedings, IEEE Workshop on Neural Networks for Signal Processing VII, Amelia Island, FL, pages 276–285, 1997.
[198] D., Parker. Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, MIT, Cambridge, MA, 1985.
[199] E., Parzen. On estimation of a probability density function and mode. Ann. Math. Statist., 33:1065–1076, 1962.Google Scholar
[200] P., Pavlidis, J., Weston, J., Cai, and W. N., Grundy. Gene functional classification from heterogeneous data. In International Conference on Computational Biology, Pittsburgh, PA, pages 249–255, 2001.
[201] K., Pearson. On lines and planes of closest fit to systems of points in space. Phil. Mag. Ser. 6, 2:559–572, 1901.Google Scholar
[202] M. S., Pepe. The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford: Oxford University Press, 2003.
[203] PhysioNet.
[204] J. C., Platt. Fast training of support vector machines using sequential minimal opti¬mization. In B., Schölkopf, C. J. C., Burges, and A. J., Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA: MIT Press, pages 185–208, 1999.
[205] J. C., Platt. Using analytic QP and sparseness to speed training of support vector machines. In Advances in Neural Information Processing Systems 10, 1998.
[206] N., Pochet, F., De Smet, J. A. K., Suykens, and B. L. R., De Moor. Systematic benchmarking of microarray data classification: Assessing the role of nonlinearity and dimensionality reduction. Bioinformatics, 20(17):3185–3195, 2004.Google Scholar
[207] T., Poggio and F., Girosi. Networks for approximation and learning. Proc. IEEE, 78(9):1481–1497, 1990.Google Scholar
[208] S. L., Pomeroy, P., Tamayo, M., Gaasenbeeket al.Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–442, 2002.Google Scholar
[209] M., Pontil and A., Verri. Support vector machines for 3D object recognition. IEEE Trans. Pattern Analysis Machine Intell., 20:637–646, 1998.Google Scholar
[210] H. V., Poor. An Introductionn to Signal Dection and Estimation, 2nd edition, Berlin: Springer, 1994.
[211] D. M., Pozar. Microwave Engineering, 3rd edition. New York: Wiley, 2005.
[212] J. C., Rajapakse, K. B., Duan, and W. K., Yeo. Proteomic cancer classification with mass spectrometry data. Am. J. Pharmacogenomics, 5(5):281–292, 2005.Google Scholar
[213] S., Ramaswamy, P., Tamayo, R., Rifkinet al.Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, 98(26):15149–15154, 2001.Google Scholar
[214] J., Read, B., Pfahringer, G., Holmes, and E., Frank. Classifier chains for multi-label classification. In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 254–269, 2009.
[215] M., Reich, K., Ohm, M., Angelo, P., Tamayo, and J. P., MesirovGeneCluster 2.0: An advanced toolset for bioarray analysis. Bioinformatics, 20(11):1797–1798, 2004.Google Scholar
[216] A., Reinhardt and T., Hubbard. Using neural networks for prediction of the subcellular location of proteins. Nucl. Acids Res., 26:2230–2236, 1998.Google Scholar
[217] B. D., Ripley. Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press, 1996.
[218] J., Rissanen. A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, 1983.Google Scholar
[219] H., Ritter, T., Martinetz, and K., Schulten. Neural Computation and Self-Organizing Maps: An Introduction.Reading, MA: Addison-Wesley, 1992.
[220] F., Rosenblatt. The perceptron: A probabilistic model for information storage and organization of the brain. Psychol. Rev., 65:42–99, 1958.Google Scholar
[221] M., Rosenblatt. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist., 27:832–837, 1956.Google Scholar
[222] M., Rosenblatt. Density estimates and Markov sequences. In M., Puri, editor, Nonpara-metric Techniques in Statistical Inference.London: Cambridge University Press, pages 199–213, 1970.
[223] V., Roth, J., Laub, J. M., Buhmann, and K.-R., Muller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems 15.Cambridge, MA: MIT Press, pages 817–824, 2003.
[224] V., Roth and V., Steinhage. Nonlinear discriminant analysis using kernel functions. In S. A., Sola, T. K., Leen, and K.-R., Muller, editors, Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000.
[225] R., Rothe, Yinan, Yu, and S. Y., Kung. Parameter design tradeoff between prediction performance and training time for ridge-SVM. In Proceedings, 2013 IEEE International Workshop on Machine Learning For Signal Processing, Southampton, 2013.
[226] D. E., Rumelhart, G. E., Hinton, and R. J., Williams. Learning internal representations by error propagation. In D. E., Rumelhart, J. L., McClelland, and the PDP Research Group, editors, Parallel Distribution Processing: Explorations in the Microstruture of Cognition, Volume 1: Foundation.Cambridge, MA: MIT Press/Bradford Books, 1986.
[227] T. D., Sanger. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 12:459–173, 1989.Google Scholar
[228] A., Sayed. Fundamentals of Adaptive Filtering.New York: Wiley, 2003.
[229] A., Sayed and T., Kailath. A state space approach to adaptive RLS filtering. IEEE Signal Processing Mag., 11:18–60, 1994.Google Scholar
[230] A. H., Sayed. Fundamentals of Adaptive Filtering.John Wiley, 2003 (see page 30).
[231] R. E., Schapire and Y., Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.Google Scholar
[232] L., Scharf. Statistical Signal Processing.Reading, MA: Addison-Wesley, 1991.
[233] S. M., Schennach. Nonparametric regression in the presence of measurement error. Econometric Theory, 20(6):1046–1093, 2004.Google Scholar
[234] B., Schölkopf. Statistical learning and kernel methods. Technical Report MSR-TR 200023, Microsoft Research, 2000.
[235] B., Schölkopf, C., Burges, and V., Vapnik. Incorporating invariances in support vector learning machines. In Proceedings, International Conference on Artificial Neural Networks, 1996.
[236] B., Schölkopf, R., Herbrich, A., Smola, and R., Williamson. A generalized representer theorem. NeuroCOLT2 Technical Report Series, NC2-TR-2000-82, 2000.
[237] B., Schölkopf, J. C., Platt, J., Shawe-Taylor, A. J., Smola, and R. C., Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13:1443–1472, 2001.Google Scholar
[238] B., Schölkopf, A., Smola, and K.-R., Muller. Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998.Google Scholar
[238b] B., Schölkopf, A. J., Smola, R. C., Williamson, and P. L., Bartlett. New support vector algorithms. Neural Comput., 12:1207–1245, 2000.Google Scholar
[239] B., Schölkopf and A. J., Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.Cambridge, MA: MIT Press, 2002.
[240] B., Schölkopf, R. C., Williamson, A. J., Smola, J., Shawe-Taylor, and J. C., Platt. Support vector method for novelty detection. In S. A., Sola, T. K., Leen, and K.-R., Muller, editors, Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, pages 568–574, 2000.
[241] A., Schwaighofer. SVM toolbox for MATLAB.
[242] G., Schwartz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978.Google Scholar
[243] D. J., Sebald and J. A., Bucklew. Support vector machine techniques for nonlinear equalization. IEEE Trans. Signal Processing, 48(11):3217–3226, 2000.Google Scholar
[244] J., Shawe-Taylor and N., Cristianini. Support Vector Machines and Other Kernel-Based Learning Methods.Cambridge: Cambridge University Press, 2004.
[245] H. B., Shen and K. C., Chou. Virus-mPLoc: A fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn., 26:175–186, 2010.Google Scholar
[246] P., Simard, A., Smola., B., Schölkopf, and V., Vapnik. Prior knowledge in support vector kernels. Advances in Neural Information Processing Systems 10.640–646, 1998.Google Scholar
[247] D., Singh, P. G., Febbo, K., Rosset al.Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002.Google Scholar
[248] I., Yamada, K., Slavakis, and S., Theodoridis. Online classification using kernels and projection-based adaptive algorithms. IEEE Trans. Signal Processing, 56(7):2781–2797, 2008.Google Scholar
[249] T. F., Smith and M. S., Waterman. Comparison of biosequences. Adv. Appl. Math., 2:482–489, 1981.Google Scholar
[250] A. J., Smola, B., Schölkopf, and K. R., Müller. The connection between regularization operators and support vector kernels. Neural Networks, 11:637–649, 1998.Google Scholar
[251] A. J., Smola, P. L., Bartlett, B., Schölkopf, and D., Schuurmans. Advances in Large Margin Classifiers.Cambridge, MA: MIT Press, 2000.
[252] P. H. A., Sneath and R. R., Sokal. Numerical taxonomy: The Principles and Practice of Numerical Classification.San Francisco, CA: W. H. Freeman, 1973.
[253] M., Song, C., Breneman, J., Biet al.Prediction of protein retention times in anion- exchange chromatography systems using support vector regression. J. Chem. Information Computer Sci., 42:1347–1357, 2002.Google Scholar
[254] M. H., Song, J., Lee, S. P., Cho, K. J., Lee, and S. K., Yoo. Support vector machine based arrhythmia classification using reduced features. Int. J. Control, Automation, Systems, 3:571–579, 2005.Google Scholar
[255] T., Sørlie, R., Tibshirani, J., Parkeret al.Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14):8418–8423, 2003.Google Scholar
[256] D. F., Specht. Probabilistic neural networks. Neural Networks, 3:109–118, 1990.Google Scholar
[257] P. T., Spellman, G., Sherlock, M. Q., Zhanget al.Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12):3273–3297, 1998.Google Scholar
[258] M. O., Stitson and J. A. E., Weston. Implementational issues of support vector machines. Technical Report CSD-TR-96-18, Computational Intelligence Group, Royal Holloway, University of London, 1996.
[259] G., Strang. Introduction to Linear Algebra.Wellesley, MA: Wellesley Cambridge Press, 2003.
[260] J. A. K., Suykens and J., Vandewalle. Least squares support vector machine classifiers. Neural Processing Lett., 9(3):293–300, 1999.Google Scholar
[261] SVMlight.
[262] I., Tagkopoulos, N., Slavov, and S. Y., Kung. Multi-class biclustering and classification based on modeling of gene regulatory networks. In Proceedings, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE '05).Minneapolis, MN, pages 89–96, 2005.
[263] P., Tamayo, D., Slonim, J., Mesirovet al.Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96:2907–2912, 1999.Google Scholar
[264] S., Tavazoie, D., Hughes, M. J., Campbell, R. J., Cho, and G. M., Church. Systematic determination of genetic network architecture. Nature Genetics, 22:281–285, 1999.Google Scholar
[265] D. M. J., Tax and R. P., W. Duin. Data domain description using support vectors. In M., Verleysen (Editor), Proceedings of the European Symposium on Artificial Neural Networks, ESANN '99, Brussels, pages 251–256, 1999.
[266] D. M. J., Tax and R. P., W. Duin. Support vector domain description. Pattern Recognition Lett., 20:1191–1199, 1999.Google Scholar
[267] S., Theodoridis and K., Koutroumbas. Pattern Recognition, 4th edition. New York: Academic Press, 2008.
[268] R., Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal Statist. Soc. B, 58:267–288, 1996.Google Scholar
[269] F., Tobar, D., Mandic, and S. Y., Kung. The multikernel least mean square algorithm. IEEE Trans. Neural Networks Learning Systems, 99 accepted for publication. 2013.
[270] C., Tong, V., Svetnik, B., Schölkopfet al.Novelty detection in mass spectral data using a support vector machine method. In Advances in Neural Information Processing Systems 12.Cambridge, MA: MIT Press, 2000.
[271] S., Tong and D., Koller. Support vector machine active learning with applications to text classification. J. Machine Learning Res., 2:45–66, 2002.Google Scholar
[272] L. N., Trefethen and D, Bau III.Numerical Linear Algebra.Philadelphia, PA: Society for Industrial and Applied Mathematics, 1997.
[273] G., Tsoumakas and I., Katakis. Multi-label classification: An overview. Int. J. Data Warehousing Mining, 3:1–13, 2007.Google Scholar
[274] G., Tsoumakas, I., Katakis, and I., Vlahavas. Mining multi-label data. In O., Maimon and L., Rokach (Editors), Data Mining and Knowledge Discovery Handbook, 2nd edition. Berlin: Springer, 2010.
[275] A. N., Tychonoff. On the stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):195–198, 1943.Google Scholar
[276] I., Van Mechelen, H. H., Bock, and P., De Boeck. Two-mode clustering methods: A structured overview. Statist. Methods Med. Res., 13(5):363–394, 2004.Google Scholar
[277] L., J. van't Veer, Hongyue Dai, M. J., van de Vijveret al.Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.Google Scholar
[278] V., Vapnik. Estimation of dependences based on empirical data [in Russian]. Moscow, Nauka, 1979. (English translation New York: Springer, 1982.)
[279] V., Vapnik, S., Golowich, and A., Smola. Support vector method for function approximation, regression estimation, and signal processing. In M., Mozer, M., Jordan, and T., Petsche (editors), Advances in Neural Information Processing Systems 9.Cambridge, MA: MIT Press, pages 281–287, 1997.
[280] V. N., Vapnik. The Nature of Statistical Learning Theory.New York: Springer-Verlag, 1995.
[281] V. N., Vapnik. Statistical Learning Theory.New York: Wiley, 1998.
[282] C., Vens, J., Struyf, L., Schietgat, S., Dzeroski, and H., Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 2(73):185–214, 2008.Google Scholar
[283] N., Villa and F., Rossi. A comparison between dissimilarity SOM and kernel SOM clustering the vertices of a graph. In Proceedings of the 6th International Workshop on Self-Organizing Maps.Bielefeld: Bielefeld University, 2007.
[284] G., Wahba. Spline Models for Observational Data.Philadelphia, PA: SIAM, 1990.
[285] Shibiao, Wan, Man-Wai, Mak, and S. Y., Kung. mGOASVM: Multi-label protein sub- cellular localization based on gene ontology and support vector machines. BMC Bioinformatics, 13:290, 2012 (available at
[286] Shibiao, Wan, Man-Wai, Mak, and S. Y., Kung. Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction. In Proceedings of ICASSP '13, Vancouver, pages 3547–3551, 2013.
[287] Jeen-Shing, Wang and Jen-Chieh, Chiang. A cluster validity measure with outlier detection for support vector clustering. IEEE Trans. Systems, Man, Cybernet. B, 38:78–89, 2008.Google Scholar
[288] L., Wang, J., Zhu, and H., Zou. The doubly regularized support vector machine. Statist. Sinica, 16:589–615, 2006.Google Scholar
[289] Y., Wang, A., Reibman, F., Juang, T., Chen, and S. Y., Kung. In Proceedings of the IEEE Workshops on Multimedia Signal Processing.Princeton, MA: IEEE Press, 1997.
[290] Z., Wang, S. Y., Kung, J., Zhanget al.Computational intelligence approach for gene expression data mining and classification. In Proceedings of the IEEE International Conference on Multimedia & Expo.Princeton, MA: IEEE Press, 2003.
[291] Z., Wang, Y., Wang, J., Luet al. Discriminatory mining of gene expression microarray data. J. Signal Processing Systems, 35:255–272, 2003.
[292] J. H., Ward. Hierarchical grouping to optimize an objective function. J. Am. Statist. Assoc., 58:236–244, 1963.Google Scholar
[293] G. S., Watson. Smooth regression analysis. Sankhya: Indian J. Statist. Ser. A, 26:359–372, 1964.Google Scholar
[294] P. J., Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavior Science. PhD thesis, Harvard University, Cambridge, MA, 1974.
[295] J., Weston, A., Elisseeff, G., BakIr, and F., Sinz.
[296] J., Weston, A., Elisseeff, B., Schölkopf, and M., Tipping. Use of the zero-norm with linear models and kernel methods. J. Machine Learning Res., 3:1439–1461, 2003.Google Scholar
[297] J., Weston and C., Watkins. Multi-class support vector machines. In Proceedings of ESANN, Brussels, 1999.
[298] B., Widrow and S. D., Stern. Adaptive Signal Processing.Englewood Cliffs, NJ: Prentice Hall, 1984.
[299] N., Wiener. Interpolation and Smoothing of Stationary Time Series.Cambridge, MA: MIT Press, 1949.
[299b] S., Winters-Hilt, A., Yelundur, C., McChesney, and M., Landry. Support vector machine implementations for classification clustering. BMC Bioinformatics, 7:S4, published online, 2006.Google Scholar
[300] L., Wolf and A., Shashua. Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J. Machine Learning Res., 6:1855–1887, 2005.Google Scholar
[301] M. A., Woodbury. Inverting modified matrices. In Statistical Research Group Memorandum Report 42, MR38136, Princeton University, Princeton, NJ, 1950.
[302] Peiyuan, Wu and S. Y., Kung. Kernel-induced optimal linear estimators and generalized Gauss-Markov theorems. Submitted 2013.
[302b] Peiyuan, Wu, C. C., Fang, J. M., Chang, S., Gilbert, and S. Y., Kung. Cost-effective kernel ridge regression implementation for keystroke-based active authentication system. In Proceedings of ICASSP '14, Florence, Italy, 2014.
[303] Z. D., Wu, W. X., Xie, and J. P., Yu. Fuzzy c-means clustering algorithm based on kernel method. In Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, pages 49–54, 2003.
[304] X., Xiao, Z. C., Wu, and K. C., Chou. iLoc-Virus: A multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J.|Theor. Biol., 284:42–51, 2011.Google Scholar
[305] E. P., Xing and R. M., Karp. CLIFF: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(90001):306–315, 2001.Google Scholar
[306] H., Xu, C., Caramanis, and S., Mannor. Robust regression and LASSO. In Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press, pages 1801–1808, 2009.
[307] Rui, Xu and D., WunschII. Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645–678, 2005.Google Scholar
[308] L., Yan, R., Dodier, M. C., Mozer, and Wolniewicz, R.Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistics. In Proceedings of the International Conference on Machine Learning, pages 848–855, 2003.
[309] Haiqin, Yang, Kaizhu, Huang, Laiwan, Chan, I., King, and M. R., Lyu. Outliers treatment in support vector regression for financial time series prediction. In ICONIP '04, pages 1260–1265, 2004.Google Scholar
[310] Yinan, Yu, K., Diamantaras, T., McKelvey, and S. Y., Kung. Ridge-adjusted slack variable optimization for supervised classification. In IEEE International Workshop on Machine Learning for Signal Processing, Southampton, 2013.
[311] Yinan, Yu, T., McKelvey, and S. Y., Kung. A classification scheme for “high-dimensional-small-sample-size” data using SODA and ridge-SVM with medical applications. In Proceedings, 2013 International Conference on Acoustics, Speech, and Signal Processing, 2013.
[311b] Yinan, Yu, T., McKelvey, and S. Y., Kung. Kernel SODA: A feature reduction technique using kernel based analysis. In Proceedings, 12th International Conference on Machine Learning and Applications (ICMLA '13), volume 4B, page 340.
[312] C.-H., Yuan, G.-X., Ho and C.-J., Lin. Recent advances of large-scale linear classification. Proc. IEEE, 100:2584–2603, 2012.Google Scholar
[313] M., Yukawa. Multikernel adaptive filtering. IEEE Trans. Signal Processing, 60(9):4672–4682, 2012.Google Scholar
[314] A. L., Yullie and N. M., Grzywacz. The motion coherence theory. In Proceedings, International Conference on Computer Vision, pages 344–353, 1988.Google Scholar
[315] D.-Q., Zhang and S.-C., Chen. A novel kernelized fuzzy c-means algorithm with application in medical image segmentation. Artif. Intell. Med., 32:37–50, 2004.Google Scholar
[316] Lei, Zhang, Fuzong, Lin, and Bo, Zhang. Support vector machine learning for image retrieval. In Proceedings of the 2001 International Conference on Image Processing, volume 2, pages 721–724, 2001.Google Scholar
[317] Zhang, M. Q.. Computational prediction of eukaryotic protein-coding genes. Nature Rev. Genetics, 3(9):698–709, 2002.Google Scholar
[318] X. G., Zhang, X., Lu, Q., Shiet al.Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(197), 2006.Google Scholar
[319] Z., Zhang, X. D., Gu, and S. Y., Kung. Color-frequency-orientation histogram based image retrieval. In Proceedings of ICASSP, Kyoto, 2012.
[319b] Zhang, Z., G., Page, and H., Zhang. Applying classification separability analysis to microarray data. In S. M., Lin and K. F., Johnson, editors, Methods of Microarray Data Analysis, CAMDA '00.Boston, MA: Kluwer Academic Publishers, pages 125–136, 2001.
[320] J., Zhu, S., Rosset, T., Hastie, and R., Tibshirani. 1-norm SVMS. In Advances in Neural Information Processing Systems 16.Cambridge, MA: MIT press, 2004.
[321] Zou, H. and T., Hastie. Regularization and variable selection via the elastic net. J. Royal Statist. Soc., Ser. B, 67(2):301–320, 2005.Google Scholar