Skip to main content Accessibility help
Large Sample Covariance Matrices and High-Dimensional Data Analysis
  • This book is currently unavailable for purchase
  • Cited by 10

Book description

High-dimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. However, it has long been observed that several well-known methods in multivariate analysis become inefficient, or even misleading, when the data dimension p is larger than, say, several tens. A seminal example is the well-known inefficiency of Hotelling's T2-test in such cases. This example shows that classical large sample limits may no longer hold for high-dimensional data; statisticians must seek new limiting theorems in these instances. Thus, the theory of random matrices (RMT) serves as a much-needed and welcome alternative framework. Based on the authors' own research, this book provides a firsthand introduction to new high-dimensional statistical methods derived from RMT. The book begins with a detailed introduction to useful tools from RMT, and then presents a series of high-dimensional problems with solutions provided by RMT methods.


'This is the first book which treats systematic corrections to the classical multivariate statistical procedures so that the resultant procedures can be used for high-dimensional data. The corrections have been done by employing asymptotic tools based on the theory of random matrices.'

Yasunori Fujikoshi - Hiroshima University, Japan

'… this book is the first to cover these topics and can serve both as a good introduction to the topics as well as a comprehensive reference on the state of the art.'

Robert Stelzer Source: MathSciNet

'This book deals with the analysis of covariance matrices under two different assumptions: large-sample theory and high-dimensional-data theory. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of big-data. Due to its novelty and its relevance in the current research, the authors focus mainly on the high-dimensional-data framework. … The theory and the applications are presented under both the large-sample theory and the high-dimensional-data theory, and thus the reader can easily appreciate the differences between the two approaches. The material is presented in a quite simple manner, and the reader only needs some pre-requisites in basic mathematical statistics, linear algebra, and theory of multivariate normal distributions. Some technical prerequisites are collected in two appendices. Therefore, the book can be used by graduate students and researchers in a wide range of disciplines, ranging from mathematics to applied sciences.'

Fabio Rapallo Source: Zentralblatt MATH

Refine List

Actions for selected content:

Select all | Deselect all
  • View selected items
  • Export citations
  • Download PDF (zip)
  • Send to Kindle
  • Send to Dropbox
  • Send to Google Drive

Save Search

You can save your searches here and later view and run them again in "My saved searches".

Please provide a title, maximum of 40 characters.
Anderson, G. W., Guionnet, A., and Zeitouni, O. 2010. An introduction to random matrices. Cambridge Studies in Advanced Mathematics, vol. 118. Cambridge: Cambridge University Press.
Anderson, T. W. 2003. An introduction to multivariate statistical analysis. 3rd edn. Hoboken, NJ: John Wiley.
Anderson, T. W., and Amemiya, Y. 1988. The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann. Statist., 16(2), 759–771.
Anderson, T. W., and Rubin, H. 1956. Statistical inference in factor analysis. Pages 111–150 of Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. 5. Berkeley: University of California Press.
Arnold, L. 1967. On the asymptotic distribution of the eigenvalues of random matrices. J. Math. Anal. Appl., 20, 262–268.
Arnold, L. 1971. On Wigner's semicircle law for the eigenvalues of random matrices. Z. Wahrsch. Verw. Gebiete, 19, 191–198.
Bai, Z. 1985. A note on limiting distribution of the eigenvalues of a class of random matrice. J. Math. Res. Exposition, 5(2), 113–118.
Bai, Z. 1999. Methodologies in spectral analysis of large dimensional random matrices: A review. Stat. Sin., 9, 611–677. With comments by G. J., Rodgers and Jack W., Silverstein; and a rejoinder by the author.
Bai, Z. 2005. High dimensional data analysis. Cosmos, 1(1), 17–27.
Bai, Z., and Ding, X. 2012. Estimation of spiked eigenvalues in spiked models. Random Matrices Theory Appl., 1(2), 1150011.
Bai, Z., and Saranadasa, H. 1996. Effect of high dimension: By an example of a two sample problem. Stat. Sin., 6(2), 311–329.
Bai, Z., and Silverstein, J. W. 1998. No eigenvalues outside the support of the limiting spectral distribution of large dimensional sample covariance matrices. Ann. Probab., 26, 316–345.
Bai, Z., and Silverstein, J. W. 1999. Exact separation of eigenvalues of large dimensional sample covariance matrices. Ann. Probab., 27(3), 1536–1555.
Bai, Z., and Silverstein, J. W. 2004. CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab., 32, 553–605.
Bai, Z., and Silverstein, J. W. 2010. Spectral Analysis of Large Dimensional Random Matrices. 2nd ed. New York: Springer.
Bai, Z., and Yao, J. 2008. Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincare Probab. Stat., 44(3), 447–474.
Bai, Z., and Yao, J. 2012. On sample eigenvalues in a generalized spiked population model. J. Multivariate Anal., 106, 167–177.
Bai, Z., and Yin, Y. Q. 1988. A convergence to the semicircle law. Ann. Probab., 16(2), 863–875.
Bai, Z., Yin, Y. Q., and Krishnaiah, P. R. 1986. On limiting spectral distribution of product of two random matrices when the underlying distribution is isotropic. J. Multvariate Anal., 19, 189–200.
Bai, Z., Yin, Y. Q., and Krishnaiah, P. R. 1987. On the limiting empirical distribution function of the eigenvalues of a multivariate F-matrix. Probab. Theory Appl., 32, 490–500.
Bai, Z., Miao, B. Q., and Pan, G. M. 2007. On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab., 35(4), 1532–1572.
Bai, Z., Jiang, D., Yao, J., and Zheng, S. 2009a. Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Stat., 37(6B), 3822–3840.
Bai, Z., Liu, H., and Wong, W. 2009b. Enhancement of the applicability of Markowitz's portfolio optimization by utilizing random matrix theory. Math. Finance, 19, 639–667.
Bai, Z., Chen, J., and Yao, J. 2010. On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust.N. Z.J. Stat., 52(4), 423–437.
Bai, Z., Li, H., and Wong, W. K. 2013a. The best estimation for high-dimensional Markowitz mean-variance optimization. Tech. rept. Northeast Normal University, Changchun.
Bai, Z., Jiang, D., Yao, J., and Zheng, S. 2013c. Testing linear hypotheses in high-dimensional regressions. Statistics, 47(6), 1207–1223.
Baik, J., and Silverstein, J. W. 2006. Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal., 97, 1382–1408.
Baik, J., Ben Arous, G., and Pch, S. 2005. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab., 33(5), 1643–1697.
Bartlett, M. S. 1934. The vector representation of a sample. Proc. Cambridge Philos. Soc., 30, 327–340.
Bartlett, M. S. 1937. Properties of sufficiency arid statistical tests. Proc. R. Soc. London Ser. A, 160, 268282.
Benaych-Georges, F., and Nadakuditi, R. R. 2011. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math., 227(1), 494–521.
Benaych-Georges, F., Guionnet, A., and Maida, M. 2011. Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron. J. Probab., 16(60), 1621–1662.
Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289–300.
Bickel, P., and Levina, E. 2004. Some theory for Fisher's linear discriminant function ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 10(6), 989–1010.
Billingsley, P. 1968. Convergence of probability measures. New York: John Wiley.
Birke, M., and Dette, H. 2005. A note on testing the covariance matrix for large dimension. Statistics and Probability Letters, 74, 281–289.
Bouchaud, J. P., and Potters, M. 2011. The Oxford handbook of random matrix theory. Oxford: Oxford University Press.
Box, G. E. P. 1949. A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317–346.
Buja, A., Hastie, T., and Tibshirani, R. 1995. Penalized discriminant analysis. Ann. Statist., 23, 73–102.
Canner, N., Mankiw, N. G., and Weil, D.N. 1997. An asset allocation puzzle. Am.Econ. Rev., 87(1), 181–191.
Capitaine, M., Donati-Martin, C., and Féral, D. 2009. The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab., 37(1), 1–47.
Chen, J., Delyon, B., and Yao, J. 2011. On a model selection problem from high-dimensional sample covariance matrices. J. Multivariate Anal., 102, 1388–1398.
Chen, S. X., and Qin, Y.-L. 2010. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat., 38, 808–835.
Chen, S. X., Zhang, L.-X., and Zhong, P.-S. 2010. Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc., 105, 810–819.
Cheng, Y. 2004. Asymptotic probabilities of misclassification of two discriminant functions in cases of high dimensional data. Stat. Probab. Lett., 67, 9–17.
Delyon, B. 2010. Concentration inequalities for the spectral measure of random matrices. Electron. Commun. Probab., 15, 549–562.
Dempster, A. P. 1958. A high dimensional two sample significance test. Ann. Math. Stat., 29, 995–1010.
Dempster, A. P. 1960. A significance test for the separation of two highly multivariate small samples. Biometrics, 16, 41–50.
El Karoui, N. 2008. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Stat., 36(6), 2757–2790.
Fan, J., Feng, Y., and Tong, X. 2012. A road to classification in high dimensional space: The regularized optimal affine discriminant. J. R. Stat. Soc. Ser. B, 74(4), 745–771.
Féral, D., and Péché, S. 2007. The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys., 272(1), 185–228.
Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7, 179–188.
Frankfurter, G. M., Phillips, H. E., and Seagle, J. P. 1971. Portfolio selection: The effects of uncertain means, variances and covariances. J. Finan. Quant. Anal., 6, 1251–1262.
Giroux, A. 2013. Analyse Complexe (cours et exercices corrigés). Tech. rept. Département de mathématiques et statistique, Université de Montréal,
Gnedenko, B. V., and Kolmogorov, A. N. 1948. Limit distributions for sums of independent random variables. Cambridge, MA: Addison-Wesley. [translated from the Russian and annotated by K. L., Chung, with appendix by J. L., Doob (1954)].
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.
Grenander, U. 1963. Probabilities on Algebraic Structures. New York: John Wiley.
Grenander, U., and Silverstein, J. 1977. Spectral analysis of networks with random topologies. SIAMJ. Appl. Math., 32, 499–519.
Guo, Y., Hastie, T., and Tibshirani, R. 2005. Regularized discriminant analysis and its application in microar-rays. Biostatistics, 1(1), 1–18. R package downloadable at
Hastie, T., Tibshirani, R., and Friedman, J. 2009. The elements of statistical learning. 2nd ed. New York: Springer.
Hotelling, H. 1931. The generalization of Student's ratio. Ann. Math. Stat., 2, 360–378.
Huber, P. J. 1973. The 1972 Wald Memorial Lecture. Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat., 35, 73–101.
Jiang, D., Bai, Z., and Zheng, S. 2013. Testing the independence of sets of large-dimensional variables. Sci. China Math., 56(1), 135–147.
Jing, B. Y., Pan, G. M., Shao, Q.-M., and Zhou, W., 2010. Nonparametric estimate of spectral density functions of sample covariance matrices: A first step. Ann. Stat., 38, 3724–3750.
John, S. 1971. Some optimal multivariate tests. Biometrika, 58, 123–127.
John, S. 1972. The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59, 169–173.
Johnstone, I. 2001. On the distribution of the largest eigenvalue in principal components analysis. Ann. Stat., 29(2), 295–327.
Johnstone, I. 2007. High dimensional statistical inference and random matrices. Pages 307–333 of International Congress of Mathematicians, Vol. I. Zürich: Eur. Math. Soc.
Johnstone, I., and Titterington, D. 2009. Statistical challenges of high-dimensional data. Philos. Trans. R. Soc. London, Ser. A, 367(1906), 4237–4253.
Jonsson, D. 1982. Some limit theorems for the eigenvalues of a sample covariance matrix. J. Multivariate Anal., 12(1), 1–38.
Kreĭn, M. G., and Nudel′man, A. A. 1977. The Markov Moment Problem and Extremal Problems. Providence, RI: American Mathematical Society. [Ideas and problems of P. L., Čebyšev and A. A., Markov and their further development, translated from the Russian by D., Louvish, Translations of Mathematical Monographs, Vol. 50.]
Kritchman, S., and Nadler, B. 2008. Determining the number of components in a factor model from limited noisy data. Chem. Int. Lab. Syst., 94, 19–32.
Kritchman, S., and Nadler, B. 2009. Non-parametric detection of the number of signals: Hypothesis testing and random matrix theory. IEEE Trans. Signal Process., 57(10), 3930–3941.
Laloux, L., Cizeau, P. P., Bouchaud, J., and Potters, M. 1999. Noise dressing of financial correlation matrices. Phys. Rev. Lett., 83, 1467–1470.
Ledoit, O., and Wolf, M. 2002. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Stat., 30, 1081–1102.
Li, J., and Chen, S. X. 2012. Two sample tests for high dimensional covariance matrices. Ann. Stat., 40, 908–940.
Li, W. M., and Yao, J. 2012. A local moments estimation of the spectrum of a large dimensional covariance matrix. Tech. rept. arXiv:1302.0356.
Li, W., Chen, J., Qin, Y., Bai, Z., and Yao, J. 2013. Estimation of the population spectral distribution from a large dimensional sample covariance matrix. J. Stat. Plann. Inference, 143 (11), 1887–1897.
Li, Z., and Yao, J. 2014. On two simple but effective procedures for high dimensional classification of general populations. Tech. rept. arXiv:1501.01763.
Lytova, A., and Pastur, L. 2009. Central limit theorem for linear eigenvalue statistics of the Wigner and the sample covariance random matrices. Ann. Probab., 37, 1778–1840.
Marčenko, V. A., and Pastur, L. A. 1967. Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sb, 1, 457–483.
Markowitz, H. M. 1952. Portfolio selection. J. Finance, 7, 77–91.
Mehta, M. L. 2004. Random matrices. 3rd ed. New York: Academic Press.
Mestre, X. 2008. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Trans. Inform. Theory, 54, 5113–5129.
Michaud, R. O. 1989. The Markowitz optimization enigma: Is “optimized” optimal?Financial Anal., 45, 31-42.
Nadler, B. 2010. Nonparametric detection of signals by information theoretic criteria: Performance analysis and an improved estimator. IEEE Trans. Signal Process., 58 (5), 2746–2756.
Nagao, H. 1973a. Asymptotic expansions of the distributions of Bartlett's test and sphericity test under the local alternatives. Ann. Inst. Stat. Math., 25, 407–422.
Nagao, H. 1973b. On some test criteria for covariance matrix. Ann. Stat., 1, 700–709.
Nica, A., and Speicher, R. 2006. Lectures on the combinatorics offree probability. New York: Cambridge University Press.
Pafka, S., and Kondor, I. 2004. Estimated correlation matrices and portfolio optimization. Phys. A, 343, 623-634.
Pan, G. M. 2014. Comparison between two types of large sample covariance matrices. Ann. Inst. Henri Poincaré-Probab. Statist., 50, 655–677.
Pan, G. M., and Zhou, W. 2008. Central limit theorem for signal-to-interference ratio of reduced rank linear receiver. Ann. Appl. Probab., 18, 1232–1270.
Passemier, D., and Yao, J. 2012. On determining the number of spikes in a high-dimensional spiked population model. Random Matrix: Theory and Applciations, 1, 1150002.
Passemier, D., and Yao, J. 2013. Variance estimation and goodness-of-fit test in a high-dimensional strict factor model. Tech. rept. arXiv:1308.3890.
Passemier, D., and Yao, J. 2014. On the detection of the number of spikes, possibly equal, in the high-dimensional case. J. Multivariate Anal., 127, 173–183.
Pastur, L. A. 1972. On the spectrum of random matrices. Theoret. Math. Phys., 10, 67–74.
Pastur, L. A. 1973. Spectra of random self-adjoint operators. Russian Math. Surv., 28, 1–67.
Pastur, L., and Shcherbina, M. 2011. Eigenvalue distribution of large random matrices. Mathematical Surveys and Monographs, vol. 171. Providence, RI: American Mathematical Society.
Paul, D. 2007. Asymptotics of sample eigenstruture for a large dimensional spiked covariance mode. Stat. Sin., 17, 1617–1642.
Paul, D., and Aue, A. 2014. Random matrix theory in statistics: A review. J. Stat. Plann. Inference, 150, 1-29.
Péché, S. 2006. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields, 134 (1), 127–173.
Petrov, V. V. 1975. Sums ofindependent random variables. New York: Springer.
Pizzo, A., Renfrew, D., and Soshnikov, A. 2013. On finite rank deformations of Wigner matrices. Ann. Inst. Henri Poincaré Probab. Stat., 49 (1), 64–94.
Raj Rao, N. 2006. RMTool-A random matrix calculator in MATLAB. ajnrao/rmtool/.
Rao, N. R., Mingo, J. A., Speicher, R., and Edelman, A. 2008. Statistical eigen-inference from large Wishart matrices. Ann. Statist., 36 (6), 2850–2885.
Renfrew, D., and Soshnikov, A. 2013. On finite rank deformations of Wigner matrices II: Delocalized perturbations. Random Matrices Theory Appl., 2(1), 1250015.
Saranadasa, H. 1993. Asymptotic expansion of the misclassification probabilities of D-and A-criteria for discrimination from two high dimensional populations using the theory of large dimensional random matrices. J. Multivariate Anal., 46, 154–174.
Sheather, S. J., and Jones, M. C. 1991. A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B, 53, 683–690.
Silverstein, J. W. 1985. The limiting eigenvalue distribution of a multivariate F matrix. SIAM J. Math. Anal., 16(3), 641–646.
Silverstein, J. W., and Choi, S.-I. 1995. Analysis of the limiting spectral distribution of large-dimensional random matrices. J. Multivariate Anal., 54(2), 295–309.
Silverstein, J. W., and Combettes, P. L. 1992. Signal detection via spectral theory of large dimensional random matrices. IEEE Trans. Signal Process., 40, 2100–2104.
Srivastava, M. S. 2005. Some tests concerning the covariance matrix in high dimensional data. J. Jpn. Stat. Soc., 35(2), 251–272.
Srivastava, M. S., Kollo, T., and von Rosen, D. 2011. Some tests for the covariance matrix with fewer observations than the dimension under non-normality. J. Multivariate. Anal., 102, 1090–1103.
Sugiura, N., and Nagao, H. 1968. Unbiasedness of some test criteria for the equality of one or two covariance matrices. Ann. Math. Stat., 39, 1686–1692.
Szegö, G. 1959. Orthogonal polynomials. New York: American Mathematical Society.
Tao, T. 2012. Topics in random matrix theory. Graduate Studies in Mathematics, vol. 132. Providence, RI: American Mathematical Society.
Ulfarsson, M. O., and Solo, V. 2008. Dimension estimation in noisy PCA with SURE and random matrix theory. IEEE Trans. Signal Process., 56(12), 5804–5816.
Wachter, K. W. 1978. The strong limits of random matrix spectra for sample matrices of independent elements. Ann. Probab., 6(1), 1–18.
Wachter, K. W. 1980. The limiting empirical measure of multiple discriminant ratios. Ann. Stat., 8, 937–957.
Wang, Q., and Yao, J. 2013. On the sphericity test with large-dimensional observations. Electr. J. Stat., 7, 2164–2192.
Wax, M., and Kailath, T. 1985. Detection of signals by information theoretic criteria. IEEE Trans. Acoust. Speech Signal Process., 33(2), 387–392.
Wigner, E. P. 1955. Characteristic vectors bordered matrices with infinite dimensions. Ann. Math., 62, 548–564.
Wigner, E. P. 1958. On the distributions of the roots of certain symmetric matrices. Ann. Math., 67, 325–327.
Wilks, S. S. 1932. Certain generalizations in the analysis of variance. Biometrika, 24, 471–494.
Wilks, S. S. 1934. Moment-generating operators for determinants of product moments in samples from a normal system. Ann. Math., 35, 312–340.
Yin, Y. Q. 1986. Limiting spectral distribution for a class of random matrices. J. Multivariate Anal., 20(1), 50–68.
Yin, Y. Q., and Krishnaiah, P. R. 1983. A limit theorem for the eigenvalues of product of two random matrices. J. Multivariate Anal., 13, 489–507.
Zheng, S. 2012. Central limit theorem for linear spectral statistics of large dimensional F matrix. Ann. Inst. Henri Poincaré Probab. Statist., 48, 444–476.
Zheng, S., Bai, Z., and Yao, J. 2015. Substitution principle for CLT of linear spectral statistics of high- dimensional sample covariance matrices with applications to hypothesis testing. The Annals ofStatistics (accepted).
Zheng, S., Jiang, D., Bai, Z., and He, X. 2014. Inference on multiple correlation coefficients with moderately high dimensional data. Biometrika, 101(3), 748–754.


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed