Skip to main content Accessibility help
×
Home
  • Print publication year: 2016
  • Online publication date: December 2015

2 - Sparsity-aware distributed learning

from Part I - Mathematical foundations

Summary

In this chapter, the problem of sparsity-aware distributed learning is studied. In particular, we consider the setup of an ad-hoc network, the nodes of which are tasked to estimate, in a collaborative way, a sparse parameter vector of interest. Both batch and online algorithms will be discussed. In the batch learning context, the distributed LASSO algorithm and a distributed greedy technique will be presented. Furthermore, an LMS-based sparsity promoting algorithm, revolving around the l1 norm, as well as a greedy distributed LMS will be discussed. Moreover, a set-theoretic sparsity promoting distributed technique will be examined. Finally, the performance of the presented algorithms will be validated in several scenarios.

Introduction

The volume of data captured worldwide is growing at an exponential rate posing certain challenges regarding their processing and analysis. Data mining, regression, and prediction/forecasting have played a leading role in learning insights and extracting useful information from raw data. The employment of such techniques covers a wide range of applications in several areas such as biomedical, econometrics, forecasting sales models, content preference, etc. The massive amount of data produced together with their increased complexity (new types of data emerge) as well as their involvement in the Internet of Things [1] paradigm call for further advances in already established machine learning techniques in order to cope with the new challenges.

Even though data tend to live in high-dimensional spaces, they often exhibit a high degree of redundancy; that is, their useful information can be represented by using a number of attributes much lower compared to their original dimensionality. Often, this redundancy can be effectively exploited by treating the data in a transformed domain, in which they can be represented by sparse models; that is, models comprising a few nonzero parameters. Besides, sparsity is an attribute that is met in a plethora of models, modeling natural signals, since nature tends to be parsimonious. Such sparse structures can be effectively exploited in big data applications in order to reduce processing demands. The advent of compressed sensing led to novel theoretical as well as algorithmic tools, which can be efficiently employed for sparsity-aware learning, e.g. [2–7].

In many cases, processing of large amount of data is not only cumbersome but might be proved to be infeasible due to lack of processing power and/or of storage capabilities.

Related content

Powered by UNSILO
[1] L., Atzori, A., Iera, and G., Morabito, “The internet of things: a survey,” Computer networks, vol. 54, no. 15, pp. 2787–2805, 2010.
[2] E., Candès and T., Tao, “Near optimal signal recovery from random projections: Universal encoding strategies,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406– 5425, 2006.
[3] E., Candès, J., Romberg, and T., Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
[4] D., Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, April 2006.
[5] E. J., Candès, M. B., Wakin, and S. P., Boyd, “Enhancing sparsity by reweighted l1 minimization,” The Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
[6] M. F., Duarte and Y., Eldar, “Structured compressed sensing: From theory to applications,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4053–4085, 2011.
[7] S., Theodoridis, Y., Kopsinis, and K., Slavakis, “Sparsity-aware learning and compressed sensing: An overview,” 2014, to appear in the E-Reference Signal Processing, Elsevier.
[8] C.-T., Chu, S. K., Kim, Y.-A., Lin, Y., Yu, G., Bradski, A. Y., Ng, and K., Olukotun, “Map-reduce for machine learning on multicore,” in NIPS, vol. 6, 2006, pp. 281–288.
[9] J., Lin, “Mapreduce is good enough? If all you have is a hammer, throw away everything that's not a nail!Big Data, vol. 1, no. 1, pp. 28–37, 2013.
[10] I. F., Akyildiz, W., Su, Y., Sankarasubramaniam, and E., Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp. 393–422, 2002.
[11] A. H., Sayed, “Diffusion adaptation over networks,” arXiv preprint arXiv:1205.4220, 2012.
[12] G., Mateos, J., Bazerque, and G., Giannakis, “Distributed sparse linear regression,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5262–5276, 2010.
[13] C., Clifton, M., Kantarcioglu, J., Vaidya, X., Lin, and M. Y., Zhu, “Tools for privacy preserving distributed data mining,” ACM SIGKDD Explorations, vol. 4, no. 2, pp. 28–34, 2002.
[14] S., Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, Academic Press, 2015.
[15] L., Bottou and O., Bousquet, “The tradeoffs of large scale learning.” in Advances in Neural Information Processing System, vol. 20, 2007, pp. 161–168.
[16] J. F., Mota, J., Xavier, P. M., Aguiar, and M., Puschel, “Distributed basis pursuit,” Signal Processing, IEEE Transactions on, vol. 60, no. 4, pp. 1942–1956, 2012.
[17] S., Chouvardas, G., Mileounis, N., Kalouptsidis, and S., Theodoridis, “Greedy sparsitypromoting algorithms for distributed learning,” IEEE Transactions on Signal Processing, vol. 63, no. 6, pp. 1419–1432, 2015.
[18] S., Patterson, Y. C., Eldar, and I., Keidar, “Distributed compressed sensing for static and time-varying networks,” arXiv preprint arXiv:1308.6086, 2013.
[19] P., Di Lorenzo and A., Sayed, “Sparse distributed learning based on diffusion adaptation,” IEEE Transactions on Signal Processing, vol. 61, no. 6, pp. 1419–1433, 2013.
[20] S., Chouvardas, K., Slavakis, Y., Kopsinis, and S., Theodoridis, “A sparsity promoting adaptive algorithm for distributed learning,” IEEE Transactions on Signal Processing, vol. 60, no. 10, pp. 5412 –5425, oct. 2012.
[21] S., Ono, M., Yamagishi, and I., Yamada, “A sparse system identification by using adaptivelyweighted total variation via a primal-dual splitting approach,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013, pp. 6029–6033.
[22] D. L., Donoho, Y., Tsaig, I., Drori, and J.-L., Starck, “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1094–1121, 2012.
[23] S. S., Chen, D. L., Donoho, and M. A., Saunders, “Atomic decomposition by basis pursuit,” SIAM journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1998.
[24] R., Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 58, pp. 267–288, 1996.
[25] A. M., Bruckstein, D. L., Donoho, and M., Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, vol. 51, no. 1, pp. 34–81, 2009.
[26] P. A., Forero, A., Cano, and G. B., Giannakis, “Consensus-based distributed support vector machines,” The Journal of Machine Learning Research, vol. 99, pp. 1663–1707, 2010.
[27] P. A., Forero, A., Cano, and G. B., Giannakis, “Distributed clustering using wireless sensor networks,” Selected Topics in Signal Processing, IEEE Journal of, vol. 5, no. 4, pp. 707–724, 2011.
[28] D. P., Bertsekas and J. N., Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Athena-Scientific, second edition, 1999.
[29] J. A., Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231–2242, 2004.
[30] T., Peleg, Y., Eldar, and M., Elad, “Exploiting statistical dependencies in sparse representations for signal recovery,” IEEE Transactions on Signal Processing, vol. 60, no. 5, pp. 2286–2303, 2012.
[31] W., Dai and O., Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. on Information Theory, vol. 55, no. 5, pp. 2230–2249, 2009.
[32] D., Needell and J., Tropp, “Cosamp: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
[33] D., Needell and R., Vershynin, “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Found. Comput. Math, vol. 9, no. 3, pp. 317–334, 2009.
[34] H., Huang and A., Makur, “Backtracking-based matching pursuit method for sparse signal reconstruction,” IEEE Signal Processing Letters, vol. 18, no. 7, pp. 391–394, 2011.
[35] S., Foucart, “Hard thresholding pursuit: an algorithm for compressive sensing,” SIAM Journal on Numerical Analysis, vol. 49, no. 6, pp. 2543–2563, 2011.
[36] L., Xiao, S., Boyd, and S., Lall, “Ascheme for robust distributed sensor fusion based on average consensus,” in International Symposium on InformationProcessing in SensorNetworks IPSN, IEEE, 2005, pp. 63–70.
[37] C., Ravazzi, S. M., Fosson, and E., Magli, “Distributed soft thresholding for sparse signal recovery,” arXiv preprint arXiv:1301.2130, 2013.
[38] C., Lopes and A., Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3122–3136, 2008.
[39] F., Cattivelli and A., Sayed, “DiffusionLMSstrategies for distributed estimation,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1035 –1048, 2010.
[40] R. L., Cavalcante, I., Yamada, and B., Mulgrew, “An adaptive projected subgradient approach to learning in diffusion networks,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2762–2774, 2009.
[41] S., Chouvardas, K., Slavakis, and S., Theodoridis, “Adaptive robust distributed learning in diffusion sensor networks,” IEEE Trans. Signal Process., vol. 59, no. 10, pp. 4692–4707, 2011.
[42] I., Schizas, G., Mateos, and G., Giannakis, “Distributed LMS for consensus-based innetwork adaptive processing,” IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2365 –2382, 2009.
[43] G., Mateos, I. D., Schizas, and G. B., Giannakis, “Performance analysis of the consensus-based distributed LMS algorithm,” EURASIP Journal on Advances in Signal Processing, vol. 2009, p. 68, 2009.
[44] A., Sayed, Fundamentals of Adaptive Filtering, John Wiley & Sons, New Jersey, 2003.
[45] S., Haykin, Adaptive Filter Theory, Prentice Hall, 1996.
[46] A., Sayed, Adaptive Filters, John Wiley and Sons, 2008.
[47] Y., Chen, Y., Gu, and A. O., Hero, “Sparse lms for system identification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2009, IEEE, 2009, pp. 3125–3128.
[48] P. L., Combettes, “The foundations of set theoretic estimation,” Proceedings of the IEEE, vol. 81, no. 2, pp. 182–208, 1993.
[49] H., Stark and Y., Yang, Vector Space Projections: a Numerical Approach to Signal and Image Processing, Neural Nets, and Optics, John Wiley & Sons, Inc., 1998.
[50] R. T., Rockafellar, Convex Analysis, Princeton University Press, 1997, vol. 28.
[51] Y., Kopsinis, K., Slavakis, and S., Theodoridis, “Online sparse system identification and signal reconstruction using projections onto weighted balls,” IEEE Transactions on Signal Processing, vol. 59, no. 3, pp. 936–952, 2011.
[52] I., Yamada and N., Ogura, Adaptive Projected Subgradient Method for Asymptotic Minimization of Sequence of Nonnegative Convex Functions, Taylor & Francis, 2005.
[53] K., Slavakis, I., Yamada, and N., Ogura, “The adaptive projected subgradient method over the fixed point set of strongly attracting nonexpansive mappings,” Numerical Functional Analysis and Optimization, 27, vol. 7, no. 8, pp. 905–930, 2006.
[54] K., Slavakis and I., Yamada, “The adaptive projected subgradient method constrained by families of quasi-nonexpansive mappings and its application to online learning,” SIAM Journal on Optimization, vol. 23, no. 1, pp. 126–152, 2013.
[55] S., Theodoridis, K., Slavakis, and I., Yamada, “Adaptive learning in a world of projections,” IEEE Signal Processing Magazine, vol. 28, no. 1, pp. 97–123, 2011.
[56] S. P., Boyd and L., Vandenberghe, Convex Optimization, Cambridge University Press, 2004.
[57] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Reduced complexity online sparse signal reconstruction using projections onto weighted l1 balls,” in Digital Signal Processing (DSP), 2011 17th International Conference on, July 2011, pp. 1–8.
[58] K., Slavakis, Y., Kopsinis, S., Theodoridis, and S., McLaughlin, “Generalized thresholding and online sparsity-aware learning in a union of subspaces,” Signal Processing, IEEE Transactions on, vol. 61, no. 15, pp. 3760–3773, August 2013.
[59] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Generalized thresholding sparsity-aware algorithm for low complexity online learning,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, March 2012, pp. 3277– 3280.
[60] M., Yukawa and I., Yamada, “A unified view of adaptive variable-metric projection algorithms,” EURASIP Journal on Advances in Signal Processing, vol. 2009, p. 34, 2009.
[61] D. L., Duttweiler, “Proportionate NLMS adaptation in echo cancelers,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 508–518, 2000.
[62] J., Benesty and S. L., Gay, “An improved PNLMS algorithm,” in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002, pp. II-1881–II-1884.
[63] K., Slavakis, Y., Kopsinis, S., Theodoridis, G. B., Giannakis, and V., Kekatos, “Generalized iterative thresholding for sparsity-aware online volterra system identification,” in Wireless Communication Systems (ISWCS 2013), Proceedings of the Tenth International Symposium on, VDE, 2013, pp. 1–5.
[64] M., Bhotto and A., Antoniou, “Robust set-membership affine-projection adaptive-filtering algorithm,” Signal Processing, IEEE Transactions on, vol. 60, no. 1, pp. 73 –81, January 2012.
[65] V., Kekatos and G., Giannakis, “From sparse signals to sparse residuals for robust sensing,” Signal Processing, IEEE Transactions on, vol. 59, no. 7, pp. 3355–3368, July 2011.
[66] M., Rabbat and R., Nowak, “Distributed optimization in sensor networks,” in Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, ACM, 2004, pp. 20–27.
[67] G., Papageorgiou, P., Bouboulis, and S., Theodoridis, “Robust kernel-based regression using orthogonal matching pursuit,” in IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2013, pp. 1–6.
[68] Y., Kopsinis, K., Slavakis, S., Theodoridis, and S., McLaughlin, “Reduced complexity online sparse signal reconstruction using projections onto weighted l1 balls,” in Digital Signal Processing (DSP), 2011 17th International Conference on, IEEE, 2011, pp. 1–8.
[69] L., Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010), Y., Lechevallier and G., Saporta, Eds., Paris, France: Springer, August 2010, pp. 177–187. [Online]. Available: http://leon.bottou.org/papers/bottou-2010
[70] W., Xu, “Towards optimal one pass large scale learning with averaged stochastic gradient descent,” arXiv preprint arXiv:1107.2490, 2011. [Online]. Available: http://arxiv.org/abs/ 1107.2490