Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-zzh7m Total loading time: 0 Render date: 2024-04-25T12:42:44.567Z Has data issue: false hasContentIssue false

5 - Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data

Published online by Cambridge University Press:  22 March 2021

Miguel R. D. Rodrigues
Affiliation:
University College London
Yonina C. Eldar
Affiliation:
Weizmann Institute of Science, Israel
Get access

Summary

Dictionary learning has emerged as a powerful method for data-driven extraction of features from data. The initial focus was from an algorithmic perspective, but recently there has been increasing interest in the theoretical underpinnings. These rely on information-theoretic analytic tools and help us understand the fundamental limitations of dictionary-learning algorithms. We focus on theoretical aspects and summarize results on dictionary learning from vector- and tensor-valued data. Results are stated in terms of lower and upper bounds on sample complexity of dictionary learning, defined as the number of samples needed to identify or reconstruct the true dictionary underlying data from noiseless or noisy samples, respectively. Many analytic tools that help yield these results come from information theory, including restating the dictionary-learning problem as a channel-coding problem and connecting analysis of minimax risk in statistical estimation to Fano’s inequality. In addition to highlighting effects of parameters on the sample complexity of dictionary learning, we show the potential advantages of dictionary learning from tensor data and present unaddressed problems.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bengio, Y., Courville, A., and Vincent, P., “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.Google Scholar
Bracewell, R. N. and Bracewell, R. N., The Fourier transform and its applications. McGraw-Hill, 1986.Google Scholar
Daubechies, I., Ten lectures on wavelets. SIAM, 1992.CrossRefGoogle Scholar
Candès, E. J. and Donoho, D. L., “Curvelets: A surprisingly effective nonadaptive representation for objects with edges,” in Proc. 4th International Conference on Curves and Surfaces, 1999, vol. 2, pp. 105–120.Google Scholar
Jolliffe, I. T., “Principal component analysis and factor analysis,” in Principal component analysis. Springer, 1986, pp. 115–128.CrossRefGoogle Scholar
Aharon, M., Elad, M., and Bruckstein, A., “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.Google Scholar
Kolda, T. G. and Bader, B. W., “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.Google Scholar
Duarte, M. F. and Baraniuk, R. G., “Kronecker compressive sensing,” IEEE Trans. Image Processing, vol. 21, no. 2, pp. 494–504, 2012.Google Scholar
Caiafa, C. F. and Cichocki, A., “Multidimensional compressed sensing and their applications,” Wiley Interdisciplinary Rev.: Data Mining and Knowledge Discovery, vol. 3, no. 6, pp. 355–380, 2013.Google Scholar
Hawe, S., Seibert, M., and Kleinsteuber, M., “Separable dictionary learning,” in Proc. IEEE Conference Computer Vision and Pattern Recognition, 2013, pp. 438–445.Google Scholar
Ghassemi, M., Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “STARK : Structured dictionary learning through rank-one tensor recovery,” in Proc. IEEE 7th International Workshop Computational Advances in Multi-Sensor Adaptive Processing, 2017, pp. 1–5.Google Scholar
Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds for Kronecker-structured dictionary learning,” in Proc. IEEE International Symposium on Information Theory, 2016, pp. 1148–1152.Google Scholar
Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds on dictionary learning for tensor data,” IEEE Trans. Information Theory, vol. 64, no. 4, pp. 2706–2726, 2018.Google Scholar
Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “Identification of Kronecker-structured dictionaries: An asymptotic analysis,” in Proc. IEEE 7th International Workshop Computational Advances in Multi-Sensor Adaptive Processing, 2017, pp. 1–5.Google Scholar
Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “Identifiability of Kronecker-structured dictionaries for tensor data,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 1047–1062, 2018.CrossRefGoogle Scholar
Vidal, R., Ma, Y., and Sastry, S., “Generalized principal component analysis (GPCA),” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1945–1959, 2005.Google Scholar
Raina, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y., “Self -taught learning: Transfer learning from unlabeled data,” in Proc. 24th International Conference on Machine Learning, 2007, pp. 759–766.Google Scholar
Fisher, R. A., “The use of multiple measurements in taxonomic problems,” Annals Human Genetics, vol. 7, no. 2, pp. 179–188, 1936.Google Scholar
Hyvärinen, A., Karhunen, J., and Oja, E., Independent component analysis. John Wiley & Sons, 2004.Google Scholar
Coifman, R. R. and Lafon, S., “Diffusion maps,” Appl. Comput. Harmonic Analysis, vol. 21, no. 1, pp. 5–30, 2006.Google Scholar
Schölkopf, B., Smola, A., and Müller, K.-R., “Kernel principal component analysis,,” in Proc. International Conference on Artificial Neural Networks, 1997, pp. 583–588.Google Scholar
Hinton, G. E. and Salakhutdinov, R. R., “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.CrossRefGoogle ScholarPubMed
Grosse, R., Raina, R., Kwong, H., and Ng, A. Y., “Shift -invariance sparse coding for audio classification,” in Proc. 23rd Conference on Uncertainty in Artificial Intelligence, 2007, pp. 149–158.Google Scholar
Duarte-Carvajalino, J. M. and Sapiro, G., “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Trans. Image Processing, vol. 18, no. 7, pp. 1395–1408, 2009.Google Scholar
Mairal, J., Bach, F., and Ponce, J., “Task-driven dictionary learning,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 34, no. 4, pp. 791–804, 2012.Google Scholar
Aharon, M., Elad, M., and Bruckstein, A. M., “On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them,” Linear Algebra Applications, vol. 416, no. 1, pp. 48–67, 2006.Google Scholar
Remi, R. and Schnass, K., “Dictionary identification – sparse matrix-factorization via 1-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3523–3539, 2010.Google Scholar
Spielman, D. A., Wang, H., and Wright, J., “Exact recovery of sparsely-used dictionaries,” in Proc. Conference on Learning Theory, 2012, pp. 37.11–37.18.Google Scholar
Geng, Q. and Wright, J., “On the local correctness of 1-minimization for dictionary learning,” in Proc. IEEE International Symposium on Information Theory, 2014, pp. 3180–3184.Google Scholar
Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R., “Learning sparsely used overcomplete dictionaries,” in Proc. 27th Annual Conference on Learning Theory, 2014, pp. 1–15.Google Scholar
Arora, S., Ge, R., and Moitra, A., “New algorithms for learning incoherent and overcomplete dictionaries,” in Proc. 25th Annual Conference Learning Theory, 2014, pp. 1–28.Google Scholar
Gribonval, R., Jenatton, R., and Bach, F., “Sparse and spurious: Dictionary learning with noise and outliers,” IEEE Trans. Information Theory, vol. 61, no. 11, pp. 6298–6319, 2015.Google Scholar
Jung, A., Eldar, Y. C., and Görtz, N., “On the minimax risk of dictionary learning,” IEEE Trans. Information Theory, vol. 62, no. 3, pp. 1501–1515, 2015.Google Scholar
Christensen, O., An introduction to frames and Riesz bases. Springer, 2016.Google Scholar
Okoudjou, K. A., Finite frame theory: A complete introduction to overcompleteness. American Mathematical Society, 2016.CrossRefGoogle Scholar
Bajwa, W. U., Calderbank, R., and Mixon, D. G., “Two are better than one: Fundamental parameters of frame coherence,” Appl. Comput. Harmonic Analysis, vol. 33, no. 1, pp. 58–78, 2012.Google Scholar
Bajwa, W. U. and Pezeshki, A., “Finite frames for sparse signal processing,” in Finite frames, Casazza, P. and Kutyniok, G., eds. Birkhäuser, 2012, ch. 10, pp. 303–335.Google Scholar
Schnass, K., “On the identifiability of overcomplete dictionaries via the minimisation principle underlying K-SVD,” Appl. Comput. Harmonic Analysis, vol. 37, no. 3, pp. 464–491, 2014.Google Scholar
Yuan, M. and Lin, Y., “Model selection and estimation in regression with grouped variables,” J. Roy. Statist. Soc. Ser. B, vol. 68, no. 1, pp. 49–67, 2006.Google Scholar
Vapnik, V., “Principles of risk minimization for learning theory,” in Proc. Advances in Neural Information Processing Systems, 1992, pp. 831–838.Google Scholar
Engan, K., Aase, S. O., and Husoy, J. H., “Method of optimal directions for frame design,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 1999, pp. 2443–2446.Google Scholar
Mairal, J., Bach, F., Ponce, J., and Sapiro, G., “Online learning for matrix factorization and sparse coding,” J. Machine Learning Res., vol. 11, no. 1, pp. 19–60, 2010.Google Scholar
Raja, H. and Bajwa, W. U., “Cloud K-SVD: A collaborative dictionary learning algorithm for big, distributed data,” IEEE Trans. Signal Processing, vol. 64, no. 1, pp. 173–188, 2016.Google Scholar
Shakeri, Z., Raja, H., and Bajwa, W. U., “Dictionary learning based nonlinear classifier training from distributed data,” in Proc. 2nd IEEE Global Conference Signal and Information Processing, 2014, pp. 759–763.Google Scholar
Zhou, M., Chen, H., Paisley, J., Ren, L., Li, L., Xing, Z., Dunson, D., Sapiro, G., and Carin, L., “Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images,” IEEE Trans. Image Processing, vol. 21, no. 1, pp. 130–144, 2012.Google Scholar
Yu, B., “Assouad , Fano, and Le Cam,” in Festschrift for Lucien Le Cam. Springer, 1997, pp. 423–435.Google Scholar
Wainwright, M. J., “Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (lasso),” IEEE Trans. Information Theory, vol. 55, no. 5, pp. 2183–2202, 2009.Google Scholar
Massart, P., Concentration inequalities and model selection. Springer, 2007.Google Scholar
Tucker, L. R., “Implications of factor analysis of three-way matrices for measurement of change,” in Problems Measuring Change. University of Wisconsin Press, 1963, pp. 122–137.Google Scholar
Van, C. F. Loan, “The ubiquitous Kronecker product,” J. Comput. Appl. Math., vol. 123, no. 1, pp. 85–100, 2000.Google Scholar
Harshman, R. A., “Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis,” in UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970.Google Scholar
Kilmer, M. E., Martin, C. D., and Perrone, L., “A third-order generalization of the matrix SVD as a product of third-order tensors,” Technical Report, 2008.Google Scholar
Caiafa, C. F. and Cichocki, A., “Computing sparse representations of multidimensional signals using Kronecker bases,” Neural Computation, vol. 25, no. 1, pp. 186–220, 2013.Google Scholar
Gandy, S., Recht, B., and Yamada, I., “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Problems, vol. 27, no. 2, p. 025010, 2011.CrossRefGoogle Scholar
De, L. Lathauwer, B. De Moor, and Vandewalle, J., “A multilinear singular value decomposition,” SIAM J. Matrix Analysis Applications, vol. 21, no. 4, pp. 1253–1278, 2000.Google Scholar
Schnass, K., “Local identification of overcomplete dictionaries,” J. Machine Learning Res., vol. 16, pp. 1211–1242, 2015.Google Scholar
Zubair, S. and Wang, W., “Tensor dictionary learning with sparse Tucker decomposition,” in Proc. IEEE 18th International Conference on Digital Signal Processing, 2013, pp. 1–6.Google Scholar
Roemer, F., Del Galdo, G., and Haardt, M., “Tensor-based algorithms for learning multidimensional separable dictionaries,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 3963–3967.CrossRefGoogle Scholar
Dantas, C. F., da Costa, M. N., and da Rocha Lopes, R., “Learning dictionaries as a sum of Kronecker products,” IEEE Signal Processing Lett., vol. 24, no. 5, pp. 559–563, 2017.Google Scholar
Zhang, Y., Mou, X., Wang, G., and Yu, H., “Tensor-based dictionary learning for spectral CT reconstruction,” IEEE Trans. Medical Imaging, vol. 36, no. 1, pp. 142–154, 2017.Google Scholar
Soltani, S., Kilmer, M. E., and Hansen, P. C., “A tensor-based dictionary learning approach to tomographic image reconstruction,” BIT Numerical Mathe., vol. 56, no. 4, pp. 1–30, 2015.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×