Sparsity-aware distributed learning

Symeon Chouvardas; Yannis Kopsinis; Sergios Theodoridis

doi:10.1017/CBO9781316162750.003

2 - Sparsity-aware distributed learning

from Part I - Mathematical foundations

Published online by Cambridge University Press: 18 December 2015

Symeon Chouvardas ,

Yannis Kopsinis and

Sergios Theodoridis

Edited by

Shuguang Cui ,

Alfred O. Hero, III ,

Zhi-Quan Luo and

José M. F. Moura

Show author details

Symeon Chouvardas: Affiliation:
University of Athens, Greece
Yannis Kopsinis: Affiliation:
University of Athens, Greece
Sergios Theodoridis: Affiliation:
University of Athens, Greece
Shuguang Cui: Affiliation:
Texas A & M University
Alfred O. Hero, III: Affiliation:
University of Michigan, Ann Arbor
Zhi-Quan Luo: Affiliation:
University of Minnesota
José M. F. Moura: Affiliation:
Carnegie Mellon University, Pennsylvania

Book contents

Get access

Summary

In this chapter, the problem of sparsity-aware distributed learning is studied. In particular, we consider the setup of an ad-hoc network, the nodes of which are tasked to estimate, in a collaborative way, a sparse parameter vector of interest. Both batch and online algorithms will be discussed. In the batch learning context, the distributed LASSO algorithm and a distributed greedy technique will be presented. Furthermore, an LMS-based sparsity promoting algorithm, revolving around the l1 norm, as well as a greedy distributed LMS will be discussed. Moreover, a set-theoretic sparsity promoting distributed technique will be examined. Finally, the performance of the presented algorithms will be validated in several scenarios.

Introduction

The volume of data captured worldwide is growing at an exponential rate posing certain challenges regarding their processing and analysis. Data mining, regression, and prediction/forecasting have played a leading role in learning insights and extracting useful information from raw data. The employment of such techniques covers a wide range of applications in several areas such as biomedical, econometrics, forecasting sales models, content preference, etc. The massive amount of data produced together with their increased complexity (new types of data emerge) as well as their involvement in the Internet of Things [1] paradigm call for further advances in already established machine learning techniques in order to cope with the new challenges.

Even though data tend to live in high-dimensional spaces, they often exhibit a high degree of redundancy; that is, their useful information can be represented by using a number of attributes much lower compared to their original dimensionality. Often, this redundancy can be effectively exploited by treating the data in a transformed domain, in which they can be represented by sparse models; that is, models comprising a few nonzero parameters. Besides, sparsity is an attribute that is met in a plethora of models, modeling natural signals, since nature tends to be parsimonious. Such sparse structures can be effectively exploited in big data applications in order to reduce processing demands. The advent of compressed sensing led to novel theoretical as well as algorithmic tools, which can be efficiently employed for sparsity-aware learning, e.g. [2–7].

In many cases, processing of large amount of data is not only cumbersome but might be proved to be infeasible due to lack of processing power and/or of storage capabilities.

Type: Chapter
Information: Big Data over Networks , pp. 37 - 65

DOI: https://doi.org/10.1017/CBO9781316162750.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] L., Atzori, A., Iera, and G., Morabito, “The internet of things: a survey,” Computer networks, vol. 54, no. 15, pp. 2787–2805, 2010.Google Scholar

[2] E., Candès and T., Tao, “Near optimal signal recovery from random projections: Universal encoding strategies,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406– 5425, 2006.Google Scholar

[3] E., Candès, J., Romberg, and T., Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.Google Scholar

[4] D., Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, April 2006.Google Scholar

[5] E. J., Candès, M. B., Wakin, and S. P., Boyd, “Enhancing sparsity by reweighted l1 minimization,” The Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.Google Scholar

[6] M. F., Duarte and Y., Eldar, “Structured compressed sensing: From theory to applications,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4053–4085, 2011.Google Scholar

[7] S., Theodoridis, Y., Kopsinis, and K., Slavakis, “Sparsity-aware learning and compressed sensing: An overview,” 2014, to appear in the E-Reference Signal Processing, Elsevier.Google Scholar

[8] C.-T., Chu, S. K., Kim, Y.-A., Lin, Y., Yu, G., Bradski, A. Y., Ng, and K., Olukotun, “Map-reduce for machine learning on multicore,” in NIPS, vol. 6, 2006, pp. 281–288.Google Scholar

[9] J., Lin, “Mapreduce is good enough? If all you have is a hammer, throw away everything that's not a nail!” Big Data, vol. 1, no. 1, pp. 28–37, 2013.Google Scholar

[10] I. F., Akyildiz, W., Su, Y., Sankarasubramaniam, and E., Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp. 393–422, 2002.Google Scholar

[11] A. H., Sayed, “Diffusion adaptation over networks,” arXiv preprint arXiv:1205.4220, 2012.

[12] G., Mateos, J., Bazerque, and G., Giannakis, “Distributed sparse linear regression,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5262–5276, 2010.Google Scholar

[13] C., Clifton, M., Kantarcioglu, J., Vaidya, X., Lin, and M. Y., Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations, vol. 4, no. 2, pp. 28–34, 2002.Google Scholar

[14] S., Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, Academic Press, 2015.Google Scholar

[15] L., Bottou and O., Bousquet, “The tradeoffs of large scale learning.” in Advances in Neural Information Processing System, vol. 20, 2007, pp. 161–168.Google Scholar

[16] J. F., Mota, J., Xavier, P. M., Aguiar, and M., Puschel, “Distributed basis pursuit,” Signal Processing, IEEE Transactions on, vol. 60, no. 4, pp. 1942–1956, 2012.Google Scholar

[17] S., Chouvardas, G., Mileounis, N., Kalouptsidis, and S., Theodoridis, “Greedy sparsitypromoting algorithms for distributed learning,” IEEE Transactions on Signal Processing, vol. 63, no. 6, pp. 1419–1432, 2015.Google Scholar

[18] S., Patterson, Y. C., Eldar, and I., Keidar, “Distributed compressed sensing for static and time-varying networks,” arXiv preprint arXiv:1308.6086, 2013.

[19] P., Di Lorenzo and A., Sayed, “Sparse distributed learning based on diffusion adaptation,” IEEE Transactions on Signal Processing, vol. 61, no. 6, pp. 1419–1433, 2013.Google Scholar

[20] S., Chouvardas, K., Slavakis, Y., Kopsinis, and S., Theodoridis, “A sparsity promoting adaptive algorithm for distributed learning,” IEEE Transactions on Signal Processing, vol. 60, no. 10, pp. 5412 –5425, oct. 2012.Google Scholar

[21] S., Ono, M., Yamagishi, and I., Yamada, “A sparse system identification by using adaptivelyweighted total variation via a primal-dual splitting approach,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013, pp. 6029–6033.Google Scholar

[22] D. L., Donoho, Y., Tsaig, I., Drori, and J.-L., Starck, “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1094–1121, 2012.Google Scholar

[23] S. S., Chen, D. L., Donoho, and M. A., Saunders, “Atomic decomposition by basis pursuit,” SIAM journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1998.Google Scholar

[24] R., Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 58, pp. 267–288, 1996.Google Scholar

[25] A. M., Bruckstein, D. L., Donoho, and M., Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, vol. 51, no. 1, pp. 34–81, 2009.Google Scholar

[26] P. A., Forero, A., Cano, and G. B., Giannakis, “Consensus-based distributed support vector machines,” The Journal of Machine Learning Research, vol. 99, pp. 1663–1707, 2010.Google Scholar

[27] P. A., Forero, A., Cano, and G. B., Giannakis, “Distributed clustering using wireless sensor networks,” Selected Topics in Signal Processing, IEEE Journal of, vol. 5, no. 4, pp. 707–724, 2011.Google Scholar

[28] D. P., Bertsekas and J. N., Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Athena-Scientific, second edition, 1999.Google Scholar

[29] J. A., Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231–2242, 2004.Google Scholar

[30] T., Peleg, Y., Eldar, and M., Elad, “Exploiting statistical dependencies in sparse representations for signal recovery,” IEEE Transactions on Signal Processing, vol. 60, no. 5, pp. 2286–2303, 2012.Google Scholar

[31] W., Dai and O., Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. on Information Theory, vol. 55, no. 5, pp. 2230–2249, 2009.Google Scholar

[32] D., Needell and J., Tropp, “Cosamp: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.Google Scholar

[33] D., Needell and R., Vershynin, “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Found. Comput. Math, vol. 9, no. 3, pp. 317–334, 2009.Google Scholar

[34] H., Huang and A., Makur, “Backtracking-based matching pursuit method for sparse signal reconstruction,” IEEE Signal Processing Letters, vol. 18, no. 7, pp. 391–394, 2011.Google Scholar

[35] S., Foucart, “Hard thresholding pursuit: an algorithm for compressive sensing,” SIAM Journal on Numerical Analysis, vol. 49, no. 6, pp. 2543–2563, 2011.Google Scholar

[36] L., Xiao, S., Boyd, and S., Lall, “Ascheme for robust distributed sensor fusion based on average consensus,” in International Symposium on InformationProcessing in SensorNetworks IPSN, IEEE, 2005, pp. 63–70.Google Scholar

[37] C., Ravazzi, S. M., Fosson, and E., Magli, “Distributed soft thresholding for sparse signal recovery,” arXiv preprint arXiv:1301.2130, 2013.

[38] C., Lopes and A., Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3122–3136, 2008.Google Scholar

[39] F., Cattivelli and A., Sayed, “DiffusionLMSstrategies for distributed estimation,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1035 –1048, 2010.Google Scholar

[40] R. L., Cavalcante, I., Yamada, and B., Mulgrew, “An adaptive projected subgradient approach to learning in diffusion networks,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2762–2774, 2009.Google Scholar

[41] S., Chouvardas, K., Slavakis, and S., Theodoridis, “Adaptive robust distributed learning in diffusion sensor networks,” IEEE Trans. Signal Process., vol. 59, no. 10, pp. 4692–4707, 2011.Google Scholar

[42] I., Schizas, G., Mateos, and G., Giannakis, “Distributed LMS for consensus-based innetwork adaptive processing,” IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2365 –2382, 2009.Google Scholar

[43] G., Mateos, I. D., Schizas, and G. B., Giannakis, “Performance analysis of the consensus-based distributed LMS algorithm,” EURASIP Journal on Advances in Signal Processing, vol. 2009, p. 68, 2009.Google Scholar

[44] A., Sayed, Fundamentals of Adaptive Filtering, John Wiley & Sons, New Jersey, 2003.Google Scholar

[45] S., Haykin, Adaptive Filter Theory, Prentice Hall, 1996.Google Scholar

[46] A., Sayed, Adaptive Filters, John Wiley and Sons, 2008.Google Scholar

[47] Y., Chen, Y., Gu, and A. O., Hero, “Sparse lms for system identification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2009, IEEE, 2009, pp. 3125–3128.Google Scholar

[48] P. L., Combettes, “The foundations of set theoretic estimation,” Proceedings of the IEEE, vol. 81, no. 2, pp. 182–208, 1993.Google Scholar

[49] H., Stark and Y., Yang, Vector Space Projections: a Numerical Approach to Signal and Image Processing, Neural Nets, and Optics, John Wiley & Sons, Inc., 1998.Google Scholar

[50] R. T., Rockafellar, Convex Analysis, Princeton University Press, 1997, vol. 28.Google Scholar

[51] Y., Kopsinis, K., Slavakis, and S., Theodoridis, “Online sparse system identification and signal reconstruction using projections onto weighted balls,” IEEE Transactions on Signal Processing, vol. 59, no. 3, pp. 936–952, 2011.Google Scholar

[52] I., Yamada and N., Ogura, Adaptive Projected Subgradient Method for Asymptotic Minimization of Sequence of Nonnegative Convex Functions, Taylor & Francis, 2005.Google Scholar

[53] K., Slavakis, I., Yamada, and N., Ogura, “The adaptive projected subgradient method over the fixed point set of strongly attracting nonexpansive mappings,” Numerical Functional Analysis and Optimization, 27, vol. 7, no. 8, pp. 905–930, 2006.Google Scholar

[54] K., Slavakis and I., Yamada, “The adaptive projected subgradient method constrained by families of quasi-nonexpansive mappings and its application to online learning,” SIAM Journal on Optimization, vol. 23, no. 1, pp. 126–152, 2013.Google Scholar

[55] S., Theodoridis, K., Slavakis, and I., Yamada, “Adaptive learning in a world of projections,” IEEE Signal Processing Magazine, vol. 28, no. 1, pp. 97–123, 2011.Google Scholar

[56] S. P., Boyd and L., Vandenberghe, Convex Optimization, Cambridge University Press, 2004.Google Scholar

[57] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Reduced complexity online sparse signal reconstruction using projections onto weighted l1 balls,” in Digital Signal Processing (DSP), 2011 17th International Conference on, July 2011, pp. 1–8.Google Scholar

[58] K., Slavakis, Y., Kopsinis, S., Theodoridis, and S., McLaughlin, “Generalized thresholding and online sparsity-aware learning in a union of subspaces,” Signal Processing, IEEE Transactions on, vol. 61, no. 15, pp. 3760–3773, August 2013.Google Scholar

[59] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Generalized thresholding sparsity-aware algorithm for low complexity online learning,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, March 2012, pp. 3277– 3280.Google Scholar

[60] M., Yukawa and I., Yamada, “A unified view of adaptive variable-metric projection algorithms,” EURASIP Journal on Advances in Signal Processing, vol. 2009, p. 34, 2009.Google Scholar

[61] D. L., Duttweiler, “Proportionate NLMS adaptation in echo cancelers,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 508–518, 2000.Google Scholar

[62] J., Benesty and S. L., Gay, “An improved PNLMS algorithm,” in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002, pp. II-1881–II-1884.Google Scholar

[63] K., Slavakis, Y., Kopsinis, S., Theodoridis, G. B., Giannakis, and V., Kekatos, “Generalized iterative thresholding for sparsity-aware online volterra system identification,” in Wireless Communication Systems (ISWCS 2013), Proceedings of the Tenth International Symposium on, VDE, 2013, pp. 1–5.Google Scholar

[64] M., Bhotto and A., Antoniou, “Robust set-membership affine-projection adaptive-filtering algorithm,” Signal Processing, IEEE Transactions on, vol. 60, no. 1, pp. 73 –81, January 2012.Google Scholar

[65] V., Kekatos and G., Giannakis, “From sparse signals to sparse residuals for robust sensing,” Signal Processing, IEEE Transactions on, vol. 59, no. 7, pp. 3355–3368, July 2011.Google Scholar

[66] M., Rabbat and R., Nowak, “Distributed optimization in sensor networks,” in Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, ACM, 2004, pp. 20–27.Google Scholar

[67] G., Papageorgiou, P., Bouboulis, and S., Theodoridis, “Robust kernel-based regression using orthogonal matching pursuit,” in IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2013, pp. 1–6.Google Scholar

[68] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Reduced complexity online sparse signal reconstruction using projections onto weighted l1 balls,” in Digital Signal Processing (DSP), 2011 17th International Conference on, IEEE, 2011, pp. 1–8.Google Scholar

[69] L., Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010), Y., Lechevallier and G., Saporta, Eds., Paris, France: Springer, August 2010, pp. 177–187. [Online]. Available: http://leon.bottou.org/papers/bottou-2010Google Scholar

[70] W., Xu, “Towards optimal one pass large scale learning with averaged stochastic gradient descent,” arXiv preprint arXiv:1107.2490, 2011. [Online]. Available: http://arxiv.org/abs/ 1107.2490

Book contents

2 - Sparsity-aware distributed learning

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive