Skip to main content Accessibility help
Analysis of Multivariate and High-Dimensional Data
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 17
  • Export citation
  • Recommend to librarian
  • Buy the print book

Book description

'Big data' poses challenges that require both classical multivariate methods and contemporary techniques from machine learning and engineering. This modern text equips you for the new world - integrating the old and the new, fusing theory and practice and bridging the gap to statistical learning. The theoretical framework includes formal statements that set out clearly the guaranteed 'safe operating zone' for the methods and allow you to assess whether data is in the zone, or near enough. Extensive examples showcase the strengths and limitations of different methods with small classical data, data from medicine, biology, marketing and finance, high-dimensional data from bioinformatics, functional data from proteomics, and simulated data. High-dimension low-sample-size data gets special attention. Several data sets are revisited repeatedly to allow comparison of methods. Generous use of colour, algorithms, Matlab code, and problem sets complete the package. Suitable for master's/graduate students in statistics and researchers in data-rich disciplines.


'… this book is suitable for readers with various backgrounds and interests and can be read at different levels. … [It] will also be useful for working statisticians who are interested in analysis of multivariate or high-dimensional data.'

Yasunori Fujikoshi Source: Mathematical Reviews

'A major advantage and important feature of this book is that it illustrates the interconnection of various techniques, such as the connection between discriminant analysis and principal component analysis, cluster analysis and principal component analysis, and factor analysis and cluster analysis. The author has undoubtedly taken on an appreciable task of collecting, compiling, connecting and presenting classical as well as recent developments concisely … Readers are expected to have an adequate background in statistics to understand the topics in this book. This well-designed book is suitable for advanced multivariate courses where the emphasis is more on applications. Application-orientated researchers will also find this book useful. Problems for each chapter are given at the end of each section. Graphics are used at various places to facilitate the understanding of topics. The book is highly recommended for libraries.'

Source: Journal of the Royal Statistical Society

'I must highly commend the author for writing an excellent comprehensive review of multivariate and high dimensional statistics … The lucid treatment and thoughtful presentation are two additional attractive features … Without any hesitation and with admiration, I would give the author a 10 out of 10 … The feat she has accomplished successfully for this difficult area of statistics is something very few could accomplish. The wealth of information is enormous and a motivated student can learn a great deal from this book … I highly recommend [it] to researchers working in the field of high dimensional data and to motivated graduate students.'

Ravindra Khattree Source: International Statistical Review

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Send to Kindle
  • Send to Dropbox
  • Send to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.


Abramowitz, M., and I.A., Stegun (1965). Handbook of Mathematical Functions. New York: Dover.
Aeberhard, S., D., Coomans and O., de Vel (1992). Comparison of classifiers in high dimensional settings. Tech. Rep. No. 92-02, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. Data sets collected by Forina et al. and available
Aebersold, R., and M., Mann (2003). Mass spectrometry-based proteomics. Nature 422, 198–207.
Aha, D., and D., Kibler (1991). Instance-based learning algorithms. Machine Learning 6, 37–66.
Aharon, M., M., Elad and A., Bruckstein (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54, 4311–4322.
Ahn, J., and J.S., Marron (2010). The maximal data piling direction for discrimination. Biometrika 97, 254–259.
Ahn, J., J.S., Marron, K.M., Mueller and Y.-Y., Chi (2007). The high-dimension low-sample-size geometric representation holds under mild conditions. Biometrika 94, 760–766.
Amari, S.-I. (2002). Independent component analysis (ICA) and method of estimating functions. IEICE Trans. Fundamentals E 85A(3), 540–547.
Amari, S.-I., and J.-F., Cardoso (1997). Blind source separation—Semiparametric statistical approach. IEEE Trans. on Signal Processing 45, 2692–2700.
Amemiya, Y., and T.W., Anderson (1990). Asymptotic chi-square tests for a large class of factor analysis models. Ann. Stat. 18, 1453–1463.
Anderson, J.C., and D.W., Gerbing (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychol. Bull. 103, 411–423.
Anderson, T.W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Stat. 34, 122–148.
Anderson, T.W. (2003). Introduction to Multivariate Statistical Analysis (3rd ed.). Hoboken, NJ: Wiley.
Anderson, T.W., and Y., Amemiya (1988). The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann. Stat. 16, 759–771.
Anderson, T.W., and H., Rubin (1956). Statistical inference in factor analysis. In Third Berkeley Symposium on Mathematical Statistics and Probability 5, pp. 111–150. Berkely: University California Press.
Attias, H. (1999). Independent factor analysis. Neural Comp. 11, 803–851.
Bach, F.R., and M.I., Jordan (2002). Kernel independent component analysis. J. Machine Learning Res. 3, 1–48.
Baik, J., and J.W., Silverstein (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97, 1382–1408.
Bair, E., T., Hastie, D., Paul and R., Tibshirani (2006). Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137.
Barbedor, P. (2009). Independent component analysis by wavelets. Test 18(1), 136–155.
Bartlett, M.S. (1938). Further aspects of the theory of multiple regression. Proc. Cambridge Philos. Soc. 34, 33–40.
Bartlett, M.S. (1939). A note on tests of significance in multivariate analysis. Proc. Cambridge Philos. Soc. 35, 180–185.
Beirlant, J., E.J., Dudewicz, L., Gyorfi and E., van der Meulen (1997). Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 6, 17–39.
Bell, A.J., and T.J., Sejnowski (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159.
Benaych-Georges, F., and R., Nadakuditi (2012). The eigenvalues and eignvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227, 494–521.
Berger, J.O. (1993). Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag.
Berlinet, A., and C., Thomas-Agnan (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Boston: Kluwer Academic Publishers.
Bickel, P.J., and E., Levina (2004). Some theory for Fisher's linear discriminant function, ‘naïve Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010.
Blake, C., and C., Merz (1998). UCI repository of machine learning databases. Data sets available at:
Borg, I., and P.J.F., Groenen (2005). Modern Multidimensional Scaling: Theory and Applications (2nd ed.). New York: Springer.
Borga, M., H., Knutsson and T., Landelius (1997). Learning canonical correlations. In Proceedings ofthe 10th Scandinavian Conference on Image Nanlysis, Lappeenranta, Finland.
Borga, M., T., Landelius and H., Knutsson (1997). A unified approach to PCA, PLS, MLR and CCA. Technical report, Linköping University, Sweden.
Boscolo, R., H., Pan and V.P., Roychowdhury (2004). Indpendent component analysis based on nonparametric density estimation. IEEE Trans. Neural Networks 15(1), 55–65.
Breiman, L. (1996). Bagging predictors. Machine Learning 26, 123–140.
Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.
Breiman, L., J., Friedman, J., Stone and R.A., Olshen (1998). Classification and Regression Trees. Boca Raton, FL: CRC Press.
Cadima, J., and I.T., Jolliffe (1995). Loadings and correlations in the interpretation of principle components. J. App. Stat. 22, 203–214.
Calinski, R.B., and J., Harabasz (1974). A dendrite method for cluster analysis. Commun. Stat. 3, 1–27.
Candes, E.J., J., Romberg and T., Tao (2006). Robust undertainty principles: Exact signal reconstruction from highly imcomplete frequency information. IEEE Trans. Inform. Theory 52, 489–509.
Cao, X.-R., and R.-W., Liu (1996). General approach to blind source separation. IEEE Trans. Signal Processing 44, 562–571.
Cardoso, J.-F. (1998). Blind source separation: Statistical principles. Proc. IEEE 86(10), 2009–2025.
Cardoso, J.-F. (1999). High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192.
Cardoso, J.-F. (2003). Dependence, correlation and Gaussianity in independent component analysis. J. Machine Learning Res. 4, 1177–1203.
Carroll, J.D., and J.J., Chang (1970). Analysis of individual differences in multidimensional scaling via a n-way generalization of ‘Eckart-Young’ decomposition. Psychometrika 35, 283–319.
Casella, G., and R.L., Berger (2001). Statistical Inference. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.
Chaudhuri, P., and J.S., Marron (1999). Sizer for exploration of structures in curves. J. Am. Stat. Assoc. 94, 807–823.
Chaudhuri, P., and J.S., Marron (2000). Scale space view of curve estimation. Ann. Stat. 28, 408–428.
Chen, A., and P.J., Bickel (2006). Efficient independent component analysis. Ann. Stat. 34, 2825–2855.
Chen, J.Z., S. M., Pizer, E. L., Chaney and S., Joshi (2002). Medical image synthesis via Monte Carlo simulation. In T., Dohi and R., Kikinis (eds.), Medical Image Computing and Computer Assisted Intervention (MICCAI). Berlin: Springer. pp. 347–354.
Chen, L., and A., Buja (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis. J. Am. Stat. Assoc. 104, 209–219.
Choi, S., A., Cichocki, H.-M., Park and S.-Y., Lee (2005). Blind source separation and independent component analysis: A review. Neural Inform. Processing 6, 1–57.
Comon, P. (1994). Independent component analysis, A new concept?Signal Processing 36, 287–314.
Cook, D., A., Buja and J., Cabrera (1993). Projection pursuit indices based on expansions with orthonormal functions. J. Comput. Graph. Stat. 2, 225–250.
Cook, D., and D., Swayne (2007). Interactive and Dynamic Graphics for Data Analysis. New York: Springer.
Cook, R.D. (1998). Regression Graphics: Ideas for Studying Regressions through Graphics. New York: Wiley.
Cook, R.D., and S., Weisberg (1999). Applied Statistics Including Computing and Graphics. New York: Wiley.
Cook, R.D., and X., Yin (2001). Dimension reduction and visualization in discriminant analysis (with discussion). Aust. NZJ. Stat. 43, 147–199.
Cormack, R.M. (1971). A review of classification (with discussion). J. R. Stat. Soc. A 134, 321–367.
Cover, T.M., and P., Hart (1967). Nearest neighbor pattern classification. Proc. IEEE Trans. Inform. Theory IT- 11, 21–27.
Cover, T.M., and J.A., Thomas (2006). Elements of Information Theory (2nd ed.). Hoboken, NJ: John Wiley.
Cox, D.R., and D.V., Hinkley (1974). Theoretical Statistics. London: Chapman and Hall.
Cox, T.F., and M.A.A., Cox (2001). Multidimensional Scaling (2nd ed.). London: Chapman and Hall.
Cristianini, N., and J., Shawe-Taylor (2000). An Introduction to Support Vector Machines. Cambridge University Press.
Davies, C., P., Corena and M., Thomas (2012). South Australian grapevine data. CSIRO Plant Industry, Glen Osmond, Australia, personal communication.
Davies, P.I., and N.J., Higham (2000). Numerically stable generation of correlation matrices and their factors. BIT 40, 640–651.
Davies, P.M., and A.P.M., Coxon (1982). Key Texts in Multidimensional Scaling. London: Heinemann Educational Books.
De Bie, T., N., Cristianini and R., Rosipal (2005). Eigenproblems in pattern recognition. In E., Bayro-Corrochano (ed.), Handbook of Geometric Computing: Applications in Pattern Recognition, Computer Vision, Neuralcomputing, and Robotics, pp. 129–170. New York: Springer.
de Silva, V., and J.B., Tenenbaum (2004). Sparse multidimensional scaling using landmark points. Technical report, Standford University.
Devroye, L., L., Gyorfi and G., Lugosi (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics. New York: Springer.
Diaconis, P., and D., Freedman (1984). Asymptotics of graphical projection pursuit. Ann. Stat. 12, 793–815.
Domeniconi, C., J., Peng and D., Gunopulos (2002). Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Machine Intell. PAMI- 24, 1281–1285.
Domingos, P., and M., Pazzani (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130.
Donoho, D.L. (2000). Nature vs. math: Interpreting independent component analysis in light of recent work in harmonic analysis. In Proceedings International Workshopon Independent Component Analysis and Blind Signal Separation (ICA 2000), Helsinki, Finland pp. 459–470.
Donoho, D.L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52, 1289–1306.
Donoho, D.L., and I.M., Johnstone (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. Comp. Rendus Acad. Sci. A 319, 1317–1322.
Dryden, I.L., and K.V., Mardia (1998). The Statistical Analysis of Shape. New York: Wiley.
Dudley, R.M. (2002). Real Analysis and Probability. Cambridge University Press.
Dudoit, S., J., Fridlyand and T.P., Speed (2002). Comparisons of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87.
Duong, T., A., Cowling, I., Koch and M.P., Wand (2008). Feature significance for multivariate kernel density estimation. Comput. Stat. Data Anal. 52, 4225–4242.
Duong, T., and M.L., Hazelton (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scan d. J. Stat. 32, 485–506.
Elad, M. (2010). Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. New York: Springer.
Eriksson, J., and V., Koivunen (2003). Characteristic-function based independent component analysis. Signal Processing 83, 2195–2208.
Eslava, G., and F.H.C., Marriott (1994). Some criteria for projection pursuit. Stat. Comput. 4, 13–20.
Fan, J., and Y., Fan (2008). High-dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2637.
Figueiredo, M.A.T., and A.K., Jain (2002). Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Machine Intell. PAMI- 24, 381–396.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188.
Fix, E., and J., Hodges (1951). Discriminatory analysis, nonparametric discrimination: Consistency properties. Technical report, Randolph Field, TX, USAF School of Aviation Medicine.
Fix, E., and J., Hodges (1952). Discriminatory analysis: Small sample performance. Technical report, Randolph Field, TX, USAF School of Aviation Medicine.
Flury, B., and H., Riedwyl (1988). Multivariate Statistics: A Practical Approach. Cambridge University Press. Data set available at:
Fraley, C., and A., Raftery (2002). Model-based clustering, discriminant ananlysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631.
Friedman, J.H. (1987). Exploratory projection pursuit. J. Am. Stat. Assoc. 82, 249–266.
Friedman, J.H. (1989). Regularized discriminant analysis. J. Am. Stat. Assoc. 84, 165–175.
Friedman, J.H. (1991). Multivariate adaptive regression splines. Ann. Stat. 19, 1–67.
Friedman, J.H., and W., Stuetzle (1981). Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823.
Friedman, J.H., W., Stuetzle and A., Schroeder (1984). Projection pursuit density estimation. J. Am. Stat. Assoc. 79, 599–608.
Friedman, J.H., and J.W., Tukey (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C- 23, 881–890.
Gentle, J.E. (2007). Matrix Algebra. New York: Springer.
Gilmour, S., and I., Koch (2006). Understanding illicit drug markets with independent component analysis. Technical report, University of New South Wales.
Gilmour, S., I., Koch, L., Degenhardt and C., Day (2006). Identification and quantification of change in Australian illicit drug markets. BMC Public Health 6, 200–209.
Givan, A.L. (2001). Flow Cytometry: First Principles (2nd ed.). New York: Wiley-Liss.
Gokcay, E., and J.C., Principe (2002). Information theoretic clustering. IEEE Trans. Pattern Anal. Machine Intell. PAMI- 24, 158–171.
Gordon, G.J., R. V., Jensen, L., Hsiao, S.R., Gullans, J.E., Blumenstock, S., Ramaswamy, W.G., Richards, D.J., Sugarbaker and R., Bueno (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967.
Gower, J.C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338.
Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika 55, 582–585.
Gower, J.C. (1971). Statistical methods of comparing different multivariate analyses of the same data. In F. R., Hodson, D., Kendall, and P., Tautu (eds.), Mathematics in the Archeological and Historical Sciences, pp. 138–149. Edinburgh University Press.
Gower, J.C., and W.J., Krzanowski (1999). Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. Appl. Stat. 48, 505–519.
Graef, J., and I., Spence (1979). Using distance information in the design of large multidimensional scaling experiments. Psychol. Bull. 86, 60–66.
Greenacre, M.J. (1984). Theoryand Applications of Correspondence Analysis. New York: Academic Press.
Greenacre, M.J. (2007). Correspondence Analysis in Practice (2nd ed.). London: Chapman and Hall/CRC Press.
Gustafsson, J.O.R. (2011). MALDI imaging mass spectrometry and its application to human disease. Ph.D. thesis, University of Adelaide.
Gustafsson, J.O.R., M. K., Oehler, A., Ruszkiewicz, S.R., McColl and P., Hoffmann (2011). MALDI imaging mass spectrometry (MALDI-IMS): Application of spatial proteomics for ovarian cancer classification and diagnosis. In T.J. Mol. Sci. 12, 773–794.
Guyon, I., and A., Elisseeff (2003). An introduction to variable and feature selection. J. Machine Learning Res. 3, 1157–1182.
Hall, P. (1988). Estimating the direction in which a data set is most interesting. Prob. Theory Relat. Fields 80, 51–77.
Hall, P. (1989a). On projection pursuit regression. Ann. Stat. 17, 573–588.
Hall, P. (1989b). Polynomial projection pursuit. Ann. Stat. 17, 589–605.
Hall, P., and K.-C., Li (1993). On almost linearity of low dimensional projections from high dimensional data. Ann. Stat. 21, 867–889.
Hall, P., J.S., Marron and A., Neeman (2005). Geometric representation of high dimension low sample size data. J.R. Stat. Soc. B (JRSS-B) 67, 427–444.
Hand, D.J. (2006). Classifier technology and the illusion of progress. Stat. Sci. 21, 1–14.
Harrison, D., and D.L., Rubinfeld (1978). Hedonic prices and the demand for clean air. J. Environ. Econ. Manage. 5, 81–102. (
Hartigan, J. (1975). Clustering Algorithms. New York: Wiley.
Hartigan, J.A. (1967). Representation of similarity matrices by trees. J. Am. Stat. Assoc. 62, 1140–1158.
Harville, D.A. (1997). Matrix Algebra from a Statistician's Perspective. New York: Springer.
Hastie, T., and R., Tibshirani (1996). Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Anal. Machine Intell. PAMI- 18, 607–616.
Hastie, T., and R., Tibshirani (2002). Independent component analysis through product density estimation. In Proceedings of Neural Information Processing Systems, pp. 649–656.
Hastie, T., R., Tibshirani and J., Friedman (2001). The Elements of Statistical Learning – Data Mining, Inference, and Prediction. New York: Springer.
Helland, I.S. (1988). On the structure of partial least squares regression. Commun. Stat. Simul. Comput. 17, 581–607.
Helland, I.S. (1990). Partial least squares regression and statistical models. Scan D.J. Stat. 17, 97–114.
Hérault, J., and B., Ans (1984). Circuits neuronaux à synapses modifiables: décodage de messages composites par apprentissage non supervisé. Comp. Rendus Acad. Sci. 299, 525–528.
Herault, J., C., Jutten and B., Ans (1985). Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. In Actes de Xeme colloque GRETSI, pp. 1017–1022, Nice, France.
Hinneburg, A., C.C., Aggarwal and D.A., Keim (2000). What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt pp. 506–515.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psych. 24, 417–441 and 498–520.
Hotelling, H. (1935). The most predictable criterion. J. Exp. Psychol. 26, 139–142.
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321–377.
Huber, P.J. (1985). Projection pursuit. Ann. Stat. 13, 435–475.
Hyvarinen, A. (1999). Fast and robust fixed-point algorithm for independent component analysis. IEEE Trans. Neural Networks 10, 626–634.
Hyvarinen, A., J., Karhunen and E., Oja (2001). Independent Component Analysis. New York: Wiley. ICA Central (1999). available at://
Inselberg, A. (1985). The plane with parallel coordinates. Visual Computer 1, 69–91.
Izenman, A.J. (2008). Modern Multivariate Statistical Techniques. New York: Springer.
Jeffers, J. (1967). Two case studies in the application of principal components. Appl. Stat. 16, 225–236.
Jing, J., I., Koch and K., Naito (2012). Polynomial histograms for multivariate density and mode estimation. Scan D.J. Stat. 39, 75–96.
John, S. (1971). Some optimal multivariate tests. Biometrika 58, 123–127.
John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika 59, 169–173.
Johnstone, I.M. (2001). On the distribution of the largest principal component. Ann. Stat. 29, 295–327.
Johnstone, I.M., and A.Y., Lu (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693.
Jolliffe, I.T. (1989). Rotation of ill-defined principal components. Appl. Stat. 38, 139–147.
Jolliffe, I.T. (1995). Rotation of principal components: Choice of normalization constraints. J. Appl. Stat. 22, 29–35.
Jolliffe, I.T., N.T., Trendafilov and M., Uddin (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12, 531–547.
Jones, M.C. (1983). The projection pursuit algorithm for exploratory data analysis. Ph.D. thesis, University of Bath.
Jones, M.C., and R., Sibson (1987). What is projection pursuit?J.R. Stat. Soc. A (JRSS-A) 150, 1–36.
Joreskog, K.G. (1973). A general method for estimating a linear structural equation system. In A.S., Gold-berger and O.D., Duncan (eds.), Structural Equation Models in the Social Sciences, pp. 85–112. San Francisco: Jossey-Bass.
Jung, S., and J.S., Marron (2009). PCA consistency in high dimension low sample size context. Ann. Stat. 37, 4104–4130.
Jung, S., A., Sen, and J.S., Marron (2012). Boundary behavior in high dimension, low sample size asymptotics of pca. J. Multivar. Anal. 109, 190–203.
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200.
Kendall, M., A., Stuart and J., Ord (1983). The Advanced Theory of Statistics, Vol. 3. London: Charles Griffin & Co.
Klemm, M., J., Haueisen and G., Ivanova (2009). Independent component analysis: Comparison of algorithms for the inverstigation of surface electrical brain activity. Med. Biol. Eng. Comput. 47, 413–423.
Koch, I., J.S., Marron and J., Chen (2005). Independent component analysis and simulation of non-Gaussian populations of kidneys. Technical Report.
Koch, I., and K., Naito (2007). Dimension selection for feature selection and dimension reduction with principal and independent component analysis. Neural Comput. 19, 513–545.
Koch, I., and K., Naito (2010). Prediction of multivariate responses with a selected number of principal components. Comput. Stat. Data Anal. 54, 1791–1807.
Kruskal, J.B. (1964a). Multidimensional scaling by optimizing goodness-of-fit to a nonmetric hypothesis. Psychometrika 29, 1–27.
Kruskal, J.B. (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika 29, 115–129.
Kruskal, J.B. (1969). Toward a practical method which helps uncover the structure of a set of multivari-ate observations by fining the linear transformation which optimizes a new ‘index of condensation’. In R.C., Milton and J.A., Nelder (eds.), Statistical Computation, pp. 427–440. New York: Academic Press.
Kruskal, J.B. (1972). Linear transformation of multivariate data to reveal clustering. In R. N., Shepard, A.K., Rommey and S.B., Nerlove (eds.), Multidimensional Scaling: Theory and Applications in the Behavioural Sciences, Vol. I, pp. 179–191. London: Seminar Press.
Kruskal, J.B., and M., Wish (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.
Krzanowski, W.J., and Y.T., Lai (1988). A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44, 23–34.
Kshirsagar, A.M. (1972). Multivariate Analysis. New York: Marcell Dekker.
Kullback, S. (1968). Probability densities with given marginals. Ann. Math. Stat. 39, 1236–1243.
Lawley, D.N. (1940). The estimation of factor loadings by the method of maximum likelihood. Pro C.R. Soc. Edinburgh A 60, 64–82.
Lawley, D.N. (1953). A modified method of estimation in factor analysis and some large sample results. In Uppsala Symposium on Psychlogical Factor Analysis, Vol. 17(19). Uppsala, Sweden: Almqvist and Wiksell pp. 34–42.
Lawley, D.N., and A.E., Maxwell (1971). Factor Analysis as a Statistical Method. New York: Elsevier.
Lawrence, N. (2005). Probabilistic non-linear principal component analysis with gaussian process latent variable models. J. Machine Learning Res. 6, 1783–1816.
Learned-Miller, E.G., and J.W., Fisher (2003). ICA using spacings estimates of entropy. J. Machine Learning Res. 4, 1271–1295.
Lee, J.A., and M., Verleysen (2007). Nonlinear Dimensionality Reduction. New York: Springer.
Lee, S., F., Zou and F.A., Wright (2010). Convergence and prediction of principal component scores in high-dimensional settings. Ann. Stat. 38, 3605–3629.
Lee, T.-W. (1998). Independent Component Analysis Theoryand Applications. Boston: Academic Publishers Kluwer.
Lee, T.-W., M., Girolami, A.J., Bell and T.J., Sejnowski (2000). A unifying information-theoretic framework for independen component analysis. Comput. Math. Appl. 39, 1–21.
Lee, T.-W., M., Girolami and T.J., Sejnowski (1999). Independent component analysis using an extended infomax algorithm for mixed subgaussian and super-Gaussian sources. Neural Comput. 11, 417–441.
Lee, Y.K., E.R., Lee, and B.U., Park (2012). Principal component analysis in very high-dimensional spaces. Stat. Sinica 22, 933–956.
Lemieux, C., I., Cloutier and J.-F., Tanguay (2008). Estrogen-induced gene expression in bone marrow c-kit+ stem cells and stromal cells: Identification of specific biological processes involved in the functional organization of the stem cell niche. Stem Cells Dev. 17, 1153–1164.
Leng, C. and H., Wang (2009). On general adaptive sparse principal component analysis. J. of Computa-tional and Graphical Statistics 18, 201–215.
Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein's lemma. J. Am. Stat. Assoc. 87, 1025–1039.
Lu, A.Y. (2002). Sparse principal component analysis for functional data. Ph.D. thesis Dept. of Statistics, Stanford University.
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Stat. 41, 772–801.
Malkovich, J.F., and A.A., Afifi (1973). On tests for multivariate normality. J. Am. Stat. Assoc. 68, 176–179.
Mallat, S. (2009). A Wavelet Tour of Signal Processing the Sparse Way (3d ed.). New York: Academic Press.
Mammen, E., J.S., Marron and N.I., Fisher (1991). Some asymptotics for multimodality tests based on kernel density estimates. Prob. Theory Relat. Fields 91, 115–132.
Marčenko, V.A., and L.A., Pastur (1967). Distribution of eigenvalues of some sets of random matrices. Math. USSR-Sb 1, 507–536.
Mardia, K.V., J., Kent and J., Bibby (1992). Multivariate Analysis. London: Academic Press.
Marron, J.S. (2008). Matlab software. pcaSM.m and curvdatSM.m available at:
Marron, J.S., M.J., Todd and J., Ahn (2007). Distance-weighted discrimination. J. Am. Stat. Assoc. 102(480), 1267–1271.
McCullagh, P. (1987). Tensor Methods in Statistics. London: Chapman and Hall.
McCullagh, P., and J., Kolassa (2009). Cumulants. Scholarpedia 4, 4699.
McCullagh, P., and J.A., Nelder (1989). Generalized Linear Models (2nd ed.), Vol. 37 of Monographs on Statistics and Applied Probability. London: Chapman and Hall.
McLachlan, G., and K., Basford (1988). Mixture Models: Inference and Application to Clustering. New York: Marcel Dekker.
McLachlan, G., and D., Peel (2000). Finite Mixture Models. New York: Wiley.
Meulman, J.J. (1992). The integration of multidimensional scaling and multivariate analysis with optimal transformations. Psychometrika 57, 530–565.
Meulman, J.J. (1993). Principal coordinates analysis with optimal transformation of the variables – minimising the sum of squares of the smallest eigenvalues. Br. J.Math. Stat. Psychol. 46, 287–300.
Meulman, J.J. (1996). Fitting a distance model to homogeneous subsets of variables: Points of view analysis of categorical data. J. Classification 13, 249–266.
Miller, A. (2002). Subset Selection in Regression (2nd ed.), Vol. 95 of Monographs on Statistics and Applied Probability. London: Chapman and Hall.
Milligan, G.W., and M.C., Cooper (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179.
Minka, T.P. (2000). Automatic choice of dimensionality for PCA. Tech Report 514, MIT. Available at
Minotte, M.C. (1997). Nonparametric testing of the existence of modes. Ann. Stat. 25, 1646–1660.
Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Stat. 36, 2791–2817.
Nason, G. (1995). Three-dimensional projection pursuit. Appl. Stat. 44, 411–430.
Ogasawara, H. (2000). Some relationships between factors and components. Psychometrika 65, 167–185.
Oja, H., S., Sirkia and J., Eriksson (2006). Scatter matrices and independent component analysis. Aust. J.Stat. 35, 175–189.
Partridge, E. (1982). Origins, A Short Etymological Dictionary of Modern English (4th ed.). London: Routledge and Kegan Paul.
Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sinica 17, 1617–1642.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572.
Prasad, M.N., A., Sowmya, and I., Koch (2008). Designing relevant features for continuous data sets using ICA. Int. J. Comput. Intell. Appl. (IJCIA) 7, 447–468.
Pryce, J.D. (1973). Basic Methods of Linear Functional Analysis. London: Hutchinson.
Qiu, X., and L., Wu (2006). Nearest neighbor discriminant analysis. Int. J.|Pattern Recog. Artif. Intell. 20, 1245–1259.
Quist, M., and G., Yona (2004). Distributional scaling: An algorithm for structure-preserving embedding of metric and nonmetric spaces. J. Machine Learning Res. 5, 399–420.
R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rai, C.S., and Y., Singh (2004). Source distribution models for blind source separation. Neurocomputing 57, 501–505.
Ramaswamy, S., P., Tamayo, R., Rifkin, S., Mukheriee, C., Yeang, M., Angelo, C., Ladd, M., Reich, E., Latulippe, J., Mesirov, T., Poggio, W., Gerald, M., Loda, E., Lander and T., Golub (2001). Multiclass cancer diagnosis using tumor gene expression signature. Proc. Nat. Aca.Sci. 98, 15149–15154.
Ramos, E., and D., Donoho (1983). Statlib datasets archive: Cars. Available at: datasets/.
Ramsay, J.O. (1982). Some statistical approaches to multidimensional scaling data. J.R. Stat. Soc. A (JRSS-A) 145, 285–312.
Rao, C. (1955). Estimation and tests of significance in factor analysis. Psychometrika 20, 93–111.
Richardson, M.W. (1938). Multidimensional psychophysics. Psychol. Bull. 35, 650–660.
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.
Rosipal, R., and L.J., Trejo (2001). Kernel partial least squares regression in reproducing kernel hilbert spaces. J. Machine Learning Res. 2, 97–123.
Rossini, A., J., Wan and Z., Moodie (2005). Rflowcyt: Statistical tools and data structures for analytic flow cytometry. R package, version 1. available at::
Rousson, V., and T., Gasser (2004). Simple component analysis. J.R. Stat. Soc. C (JRSS-C) 53, 539–555.
Roweis, S., and Z., Ghahramani (1999). A unifying review of linear gaussian models. Neural Comput. 11, 305–345.
Roweis, S.T., and L.K., Saul (2000). Nonlinear dimensionality reduction by local linear embedding. Science 290, 2323–2326.
Rudin, W. (1991). Functional Analysis (2nd ed.). New York: McGraw-Hill.
Sagae, M., D.W., Scott and N., Kusano (2006). A multivariate polynomial histogram by the method of local moments. In Proceedings ofthe 8th Workshopon Nonparametric Statistical Analysis and Related Area, Tokyo pp. 14–33 (in Japanese).
Samarov, A., and A., Tsybakov (2004). Nonparametric independent component analysis. Bernoulli 10, 565–582.
Sammon, J.W. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Computers 18, 401–409.
Schneeweiss, H., and H., Mathes (1995). Factor analysis and principal components. J. Multivar. Anal. 55, 105–124.
Schoenberg, I.J. (1935). ‘Remarks to Maurice Fréchet's article ‘Sur la définition axiomatique d'une classe d'espaces distanciés vectoriellement applicable sur l'espaces de Hilbert.’. Ann. Math. 38, 724–732.
Schölkopf, B., and A., Smola (2002). Learning with Kernels. Support Vector Machines, Regularization, Optimization and Beyond. Cambridge, MA: MIT Press.
Schölkopf, B., A., Smola and K.-R., Müller (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319.
Schott, J.R. (1996). Matrix Analysis for Statistics. New York: Wiley.
Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. New York: Wiley.
Searle, S.R. (1982). Matrix Algebra Useful for Statistics. New York: John Wiley.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.
Shen, D., H., Shen, and J.S., Marron (2012). A general framework for consistency of principal component analysis. arXiv:1211. 2671.
Shen, D., H., Shen, and J.S., Marron (2013). Consistency of sparse PCA in high dimension, low sample size. J. Multivar. Anal. 115, 317–333.
Shen, D., H., Shen, H., Zhu, and J.S., Marron (2012). High dimensional principal component scores and data visualization. arXiv:1211. 2679.
Shen, H., and J., Huang (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034.
Shepard, R.N. (1962a). The analysis of proximities: Multidimensional scaling with an unknown distance function I. Psychometrika 27, 125–140.
Shepard, R.N. (1962b). The analysis of proximities: Multidimensional scaling with an unknown distance function II. Psychometrika 27, 219–246.
Short, R.D., and K., Fukunaga (1981). Optimal distance measure for nearest neighbour classification. IEEE Trans. Inform. Theory IT- 27, 622–627.
Silverman, B.W. (1981). Using kernel density estimates to investigate multimodality. J.R. Stat. Soc. B (JRSS-B) 43, 97–99.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Vol. 26 of Monographs on Statistics and Applied Probability. London: Chapman and Hall.
Starck, J.-L., E.J., Candès and D.L., Donoho (2002). The curvelet transform for image denoising. IEEE Trans. Image Processing 11, 670–684.
Strang, G. (2005). Linear Algebra and Its Applications (4th ed.). New York: Academic Press.
Tamatani, M., I., Koch and K., Naito (2012). Pattern recognition based on canonical correlations in a high dimension low sample size context. J. Multivar. Anal. 111, 350–367.
Tamatani, M., K., Naito, and I., Koch (2013). Multi-class discriminant function based on canonical correlation in high dimension low sample size. preprint.
Tenenbaum, J.B., V., de Silva and J.C., Langford (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J.R. Stat. Soc. B (JRSS-B) 58, 267–288.
Tibshirani, R., and G., Walther (2005). Cluster validation by prediction strength. J. Comput. Graph. Stat. 14, 511–528.
Tibshirani, R., G., Walther and T., Hastie (2001). Estimating the number of clusters in a dataset via the gap statistic. J.R.|Stat. Soc. B (JRSS-B) 63, 411–423.
Tipping, M.E., and C.M., Bishop (1999). Probabilistic principal component analysis. J.R. Stat. Soc. B (JRSS-B) 61, 611–622.
Torgerson, W.S. (1952). Multidimensional scaling: 1. Theory and method. Psychometrika 17, 401–419.
Torgerson, W.S. (1958). Theory and Method of Scaling. New York: Wiley.
Torokhti, A., and S., Friedland (2009). Towards theory of generic principal component analysis. J. Multivar. Anal. 100, 661–669.
Tracy, C.A., and H., Widom (1996). On orthogonal and symplectic matrix ensembles. Commun. Math. Phys. 177, 727–754.
Tracy, C.A., and H., Widom (2000). The distribution of the largest eigenvalue in the Gaussian ensembles. In J., van Diejen and L., Vinet (eds.), Cologero-Moser-Sutherland Models, pp. 461–472. New York: Springer.
Trosset, M.W. (1997). Computing distances between convex sets and subsets of the positive semidefinite matrices. Technical Rep. 97-3, Rice University.
Trosset, M.W. (1998). A new formulation of the nonmetric strain problem in multidimensional scaling. J. Classification 15, 15–35.
Tucker, L.R., and S., Messick (1963). An individual differences model for multidimensional scaling. Psychometrika 28, 333–367.
Tyler, D.E., F., Critchley, L., Dumgen and H., Oja (2009). Invariant coordinate selection. J.R. Stat. Soc. B (JRSS-B) 71, 549–592.
van't Veer, L.J., H., Dai, M.J., van de Vijver, Y.D., He, A. A. M., Hart, M., Mao, H.L., Peterse, K., van der Kooy, M.J., Marton, A.T., Witteveen, G.J., Schreiber, R.M., Kerkhoven, C., Roberts, P.S., Linsley, R., Bernards and S.H., Friend (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.
Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.
Vapnik, V., and A., Chervonenkis (1979). Theorie der Zeichenerkennung. Berlin: Akademie-Verlag (German translation from the original Russian, published in 1974).
Vasicek, O. (1976). A test for normality based on sample entropy. J.R. Stat. Soc. B (JRSS-B) 38, 54–59.
Venables, W.N., and B.D., Ripley (2002). Modern Applied Statistics with S (4th ed.). New York: Springer.
Vines, S.K. (2000). Simple principal components. Appl. Stat. 49, 441–451.
Vlassis, N., and Y., Motomura (2001). Efficient source adaptivity in independent component analysis. IEEE Trans. on Neural Networks 12, 559–565.
von Storch, H., and F.W., Zwiers (1999). Statistical Analysis of Climate Research. Cambridge University Press.
Wand, M.P., and M.C., Jones (1995). Kernel Smoothing. London: Chapman and Hall.
Wegman, E. (1992). The grand tour in k-dimensions. In Computing Science and Statistics. New York: Springer, pp. 127–136.
Williams, R.H., D. W., Zimmerman, B. D., Zumbo and D., Ross (2003). Charles Spearman: British behavioral scientist. Human Nature Rev. 3, 114–118.
Winther, O., and K.B., Petersen (2007). Bayesian independent component analysis: Variational methods and non-negative decompositions. Digital Signal Processing 17, 858–872.
Witten, D.M., and R., Tibshirani (2010). A framework for feature selection in clustering. J. Am. Stat. Assoc. 105, 713–726.
Witten, D.M., R., Tibshirani, and T., Hastie (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534.
Witten, I.H., and E., Frank (2005). Data Mining: Practical Machine Learning Tools and Techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In P.R., Krishnaiah (ed.), Multivariate Analysis, pp. 391–420. New York: Academic Press.
Xu, R., and D., Wunsch II (2005). Survey of clustering algorithms. IEEE Trans. Neural Networks 16, 645–678.
Yeredor, A. (2000). Blind source separation via the second characteristic function. Signal Processing 80, 897–902.
Young, G., and A.S., Householder (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22.
Zass, R., and A., Shashua (2007). Nonnegative sparse PCA. In Advances in Neural Information Processing Systems (NIPS-2006), Vol. 19, B., Scholkopf, J., Platt, and T., Hofmann, eds., p. 1561. Cambridge, MA: MIT Press.
Zaunders, J., J., Jing, A.D., Kelleher and I., Koch (2012). Computationally efficient analysis of complex flow cytometry data using second order polynomial histograms. Technical Rep., St Vincent's Centre for Applied Medical Research, St Vincent's Hospital, Australia Sydney.
Zou, H., and T., Hastie (2005). Regularization and variable selection with the elastic net. J.R. Stat. Soc. B (JRSS-B) 67, 301–320.
Zou, H., T., Hastie and R., Tibshirani (2006). Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286.


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed