Skip to main content Accessibility help
×
Home
  • Print publication year: 2021
  • Online publication date: March 2021

1 - Introduction to Information Theory and Data Science.

Summary

The purpose of this chapter is to set the stage for the book and for the upcoming chapters. We first overview classical information-theoretic problems and solutions. We then discuss emerging applications of information-theoretic methods in various data-science problems and, where applicable, refer the reader to related chapters in the book. Throughout this chapter, we highlight the perspectives, tools, and methods that play important roles in classic information-theoretic paradigms and in emerging areas of data science. Table 1.1 provides a summary of the different topics covered in this chapter and highlights the different chapters that can be read as a follow-up to these topics.

[1]Shannon, C. E., “A mathematical theory of communications,” Bell System Technical J., vol. 27, nos. 3–4, pp. 379–423, 623–656, 1948.
[2]Gallager, R. G., Information theory and reliable communications. Wiley, 1968.
[3]Berger, T., Rate distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971.
[4]Csiszár, I. and Körner, J., Information theory: Coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
[5]Gersho, A. and Gray, R. M., Vector quantization and signal compression. Kluwer Academic Publishers, 1991.
[6]MacKay, D. J. C., Information theory, inference and learning algorithms. Cambridge University Press, 2003.
[7]Cover, T. M. and Thomas, J. A., Elements of information theory. John Wiley & Sons, 2006.
[8]Yeung, R. W., Information theory and network coding. Springer, 2008.
[9]El Gamal, A. and Kim, Y.-H., Network information theory. Cambridge University Press, 2011.
[10]Arikan, E., “Some remarks on the nature of the cutoff rate,” in Proc. Workshop Information Theory and Applications (ITA ’06), 2006.
[11]Blahut, R. E., Theory and practice of error control codes. Addison-Wesley Publishing Company, 1983.
[12]Lin, S. and Costello, D. J., Error control coding. Pearson, 2005.
[13]Roth, R. M., Introduction to coding theory. Cambridge University Press, 2006.
[14]Richardson, T. and Urbanke, R., Modern coding theory. Cambridge University Press, 2008.
[15]Ryan, W. E. and Lin, S., Channel codes: Classical and modern. Cambridge University Press, 2009.
[16]Arikan, E., “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Information Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
[17]Jiménez-Feltström, A. and Zigangirov, K. S., “Time-varying periodic convolutional codes with low-density parity-check matrix,” IEEE Trans. Information Theory, vol. 45, no. 2, pp. 2181–2191, 1999.
[18]Lentmaier, M., Sridharan, A., Costello, D. J. J., and Zigangirov, K. S., “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Trans. Information Theory, vol. 56, no. 10, pp. 5274–5289, 2010.
[19]Kudekar, S., Richardson, T. J., and Urbanke, R. L., “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Information Theory, vol. 57, no. 2, pp. 803–834, 2011.
[20]Candès, E. J. and Wakin, M. B., “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, 2008.
[21]Ngo, H. Q. and Du, D.-Z., “A survey on combinatorial group testing algorithms with applications to DNA library screening,” Discrete Math. Problems with Medical Appl., vol. 55, pp. 171–182, 2000.
[22]Atia, G. K. and Saligrama, V., “Boolean compressed sensing and noisy group testing,” IEEE Trans. Information Theory, vol. 58, no. 3, pp. 1880–1901, 2012.
[23]Donoho, D. and Tanner, J., “Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,” Phil. Trans. Roy. Soc. A: Math., Phys. Engineering Sci., pp. 4273–4293, 2009.
[24]Amelunxen, D., Lotz, M., McCoy, M. B., and Tropp, J. A., “Living on the edge: Phase transitions in convex programs with random data,” Information and Inference, vol. 3, no. 3, pp. 224–294, 2014.
[25]Banks, J., Moore, C., Vershynin, R., Verzelen, N., and Xu, J., “Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization,” IEEE Trans. Information Theory, vol. 64, no. 7, pp. 4872–4894, 2018.
[26]Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B., and Troyansky, L., “Determining computational complexity from characteristic ‘phase transitions,”’ Nature, vol. 400, no. 6740, pp. 133–137, 1999.
[27]Zeng, G. and Lu, Y., “Survey on computational complexity with phase transitions and extremal optimization,” in Proc. 48th IEEE Conf. Decision and Control (CDC ’09), 2009, pp. 4352–4359.
[28]Eldar, Y. C., Sampling theory: Beyond bandlimited systems. Cambridge University Press, 2014.
[29]Shannon, C. E., “Coding theorems for a discrete source with a fidelity criterion,” IRE National Convention Record, vol. 4, no. 1, pp. 142–163, 1959.
[30]Kipnis, A., Goldsmith, A. J., Eldar, Y. C., and Weissman, T., “Distortion-rate function of sub-Nyquist sampled Gaussian sources,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 401–429, 2016.
[31]Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Analog-to-digital compression: A new paradigm for converting signals to bits,” IEEE Signal Processing Mag., vol. 35, no. 3, pp. 16–39, 2018.
[32]Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Fundamental distortion limits of analogto-digital compression,” IEEE Trans. Information Theory, vol. 64, no. 9, pp. 6013–6033, 2018.
[33]Rodrigues, M. R. D., Deligiannis, N., Lai, L., and Eldar, Y. C., “Rate-distortion trade-offs in acquisition of signal parameters,” in Proc. IEEE International Conference or Acoustics, Speech, and Signal Processing (ICASSP ’17), 2017.
[34]Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Hardware-limited task-based quantization,” submitted to IEEE Trans. Signal Processing, accepted 2019.
[35]Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Asymptotic task-based quantization with application to massive MIMO,” submitted to IEEE Trans. Signal Processing, accepted 2019.
[36]Argyriou, A., Evgeniou, T., and Pontil, M., “Convex multi-task feature learning,” Machine Learning, vol. 73, no. 3, pp. 243–272, 2008.
[37]Coates, A., Ng, A., and Lee, H., “An analysis of single-layer networks in unsupervised feature learning,” in Proc. 14th International Conference on Artificial Intelligence and Statistics (AISTATS ’11), 2011, pp. 215–223.
[38]Tosic, I. and Frossard, P., “Dictionary learning,” IEEE Signal Processing Mag., vol. 28, no. 2, pp. 27–38, 2011.
[39]Bengio, Y., Courville, A., and Vincent, P., “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[40]Yu, S., Yu, K., Tresp, V., Kriegel, H.-P., and Wu, M., “Supervised probabilistic principal component analysis,” in Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’06), 2006, pp. 464–473.
[41]Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A., “Supervised dictionary learning,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’09), 2009, pp. 1033–1040.
[42]Vu, V. and Lei, J., “Minimax rates of estimation for sparse PCA in high dimensions,” in Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS ’12), 2012, pp. 1278–1286.
[43]Cai, T. T., Ma, Z., and Wu, Y., “Sparse PCA: Optimal rates and adaptive estimation,” Annals Statist., vol. 41, no. 6, pp. 3074–3110, 2013.
[44]Jung, A., Eldar, Y. C., and Görtz, N., “On the minimax risk of dictionary learning,” IEEE Trans. Information Theory, vol. 62, no. 3, pp. 1501–1515, 2016.
[45]Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds on dictionary learning for tensor data,” IEEE Trans. Information Theory, vol. 64, no. 4, 2018.
[46]Hotelling, H., “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 6, no. 24, pp. 417–441, 1933.
[47]Tipping, M. E. and Bishop, C. M., “Probabilistic principal component analysis,” J. Roy. Statist. Soc. Ser. B, vol. 61, no. 3, pp. 611–622, 1999.
[48]Jolliffe, I. T., Principal component analysis, 2nd edn. Springer-Verlag, 2002.
[49]Comon, P., “Independent component analysis: A new concept?,” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994.
[50]Hyvärinen, A., Karhunen, J., and Oja, E., Independent component analysis. John Wiley & Sons, 2004.
[51]Belhumeur, P., Hespanha, J., and Kriegman, D., “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
[52]Ye, J., Janardan, R., and Li, Q., “Two-dimensional linear discriminant analysis,,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’04), 2004, pp. 1569–1576.
[53]Hastie, T., Tibshirani, R., and Friedman, J., The elements of statistical learning: Data mining, inference, and prediction, 2nd edn. Springer, 2016.
[54]Hyvärinen, A., “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–634, 1999.
[55]Erdogmus, D., Hild, K. E., Rao, Y. N., and Príncipe, J. C., “Minimax mutual information approach for independent component analysis,” Neural Comput., vol. 16, no. 6, pp. 1235– 1252, 2004.
[56]Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D., “Minimax bounds for sparse PCA with noisy high-dimensional data,” Annals Statist., vol. 41, no. 3, pp. 1055–1084, 2013.
[57]Krauthgamer, R., Nadler, B., and Vilenchik, D., “Do semidefinite relaxations solve sparse PCA up to the information limit?,” Annals Statist., vol. 43, no. 3, pp. 1300–1322, 2015.
[58]Berthet, Q. and Rigollet, P., “Representation learning: A review and new perspectives,” Annals Statist., vol. 41, no. 4, pp. 1780–1815, 2013.
[59]Cai, T., Ma, Z., and Wu, Y., “Optimal estimation and rank detection for sparse spiked covariance matrices,” Probability Theory Related Fields, vol. 161, nos. 3–4, pp. 781–815, 2015.
[60]Onatski, A., Moreira, M., and Hallin, M., “Asymptotic power of sphericity tests for highdimensional data,” Annals Statist., vol. 41, no. 3, pp. 1204–1231, 2013.
[61]Perry, A., Wein, A., Bandeira, A., and Moitra, A., “Optimality and sub-optimality of PCA for spiked random matrices and synchronization,” arXiv:1609.05573, 2016.
[62]Ke, Z., “Detecting rare and weak spikes in large covariance matrices,” arXiv:1609.00883, 2018.
[63]Donoho, D. L. and Grimes, C., “Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data,” Proc. Natl. Acad. Sci. USA, vol. 100, no. 10, pp. 5591–5596, 2003.
[64]Tenenbaum, J. B., de Silva, V., and Langford, J. C., “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
[65]Jenssen, R., “Kernel entropy component analysis,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010.
[66]Schölkopf, B., Smola, A., and Müller, K.-R., “Kernel principal component analysis,,” in Proc. Intl. Conf. Artificial Neural Networks (ICANN ’97), 1997, pp. 583–588.
[67]Yang, J., Gao, X., Zhang, D., and Yang, J.-Y., “Kernel ICA: An alternative formulation and its application to face recognition,” Pattern Recognition, vol. 38, no. 10, pp. 1784–1787, 2005.
[68]Mika, S., Ratsch, G., Weston, J., Schölkopf, B., and Mullers, K. R., “Fisher discriminant analysis with kernels,” in Proc. IEEE Workshop Neural Networks for Signal Processing IX, 1999, pp. 41–48.
[69]Narayanan, H. and Mitter, S., “Sample complexity of testing the manifold hypothesis,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’10), 2010, pp. 1786–1794.
[70]Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., and Sejnowski, T. J., “Dictionary learning algorithms for sparse representation,” Neural Comput., vol. 15, no. 2, pp. 349–396, 2003.
[71]Aharon, M., Elad, M., and Bruckstein, A., “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
[72]Zhang, Q. and Li, B., “Discriminative K-SVD for dictionary learning in face recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’10), 2010, pp. 2691–2698.
[73]Geng, Q. and Wright, J., “On the local correctness of l1-minimization for dictionary learning,” in Proc. IEEE International Symposium on Information Theory (ISIT ’14), 2014, pp. 3180–3184.
[74]Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R., “Learning sparsely used overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 123–137.
[75]Arora, S., Ge, R., and Moitra, A., “New algorithms for learning incoherent and overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 779–806.
[76]Gribonval, R., Jenatton, R., and Bach, F., “Sparse and spurious: Dictionary learning with noise and outliers,” IEEE Trans. Information Theory, vol. 61, no. 11, pp. 6298–6319, 2015.
[77]Lee, D. D. and Seung, H. S., “Algorithms for non-negative matrix factorization,” in Proc. Advances in Neural Information Processing Systems 13 (NeurIPS ’01), 2001, pp. 556–562.
[78]Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-I., Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.
[79]Alsan, M., Liu, Z., and Tan, V. Y. F., “Minimax lower bounds for nonnegative matrix factorization,” in Proc. IEEE Statistical Signal Processing Workshop (SSP ’18), 2018, pp. 363–367.
[80]LeCun, Y., Bengio, Y., and Hinton, G., “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
[81]Goodfellow, I., Bengio, Y., and Courville, A., Deep learning. MIT Press, 2016, www. deeplearningbook.org.
[82]Tishby, N. and Zaslavsky, N., “Deep learning and the information bottleneck principle,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.
[83]Shwartz-Ziv, R. and Tishby, N., “Opening the black box of deep neural networks via information,” arXiv:1703.00810, 2017.
[84]Huang, C. W. and Narayanan, S. S., “Flow of Rényi information in deep neural networks,” in Proc. IEEE International Workshop Machine Learning for Signal Processing (MLSP ’16), 2016.
[85]Khadivi, P., Tandon, R., and Ramakrishnan, N., “Flow of information in feed-forward deep neural networks,” arXiv:1603.06220, 2016.
[86]Yu, S., Jenssen, R., and Príncipe, J., “Understanding convolutional neural network training with information theory,” arXiv:1804.09060, 2018.
[87]Yu, S. and Príncipe, J., “Understanding autoencoders with information theoretic concepts,” arXiv:1804.00057, 2018.
[88]Achille, A. and Soatto, S., “Emergence of invariance and disentangling in deep representations,” arXiv:1706.01350, 2017.
[89]Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y., “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations (ICLR ’19), 2019.
[90]Shalev-Shwartz, S. and Ben-David, S., Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
[91]Akaike, H., “A new look at the statistical model identification,” IEEE Trans. Automation Control, vol. 19, no. 6, pp. 716–723, 1974.
[92]Barron, A., Rissanen, J., and Yu, B., “The minimum description length principle in coding and modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743–2760, 1998.
[93]Wainwright, M. J., “Information-theoretic limits on sparsity recovery in the highdimensional and noisy setting,” IEEE Trans. Information Theory, vol. 55, no. 12, pp. 5728–5741, 2009.
[94]Wainwright, M. J., “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso),” IEEE Trans. Information Theory, vol. 55, no. 5, pp. 2183–2202, 2009.
[95]Raskutti, G., Wainwright, M. J., and Yu, B., “Minimax rates of estimation for highdimensional linear regression over ℓq-balls,” IEEE Trans. Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.
[96]Guo, D., Shamai, S., and Verdú, S., “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1261–1282, 2005.
[97]Guo, D., Shamai, S., and Verdú, S., “Mutual information and conditional mean estimation in Poisson channels,” IEEE Trans. Information Theory, vol. 54, no. 5, pp. 1837–1849, 2008.
[98]Lozano, A., Tulino, A. M., and Verdú, S., “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Information Theory, vol. 52, no. 7, pp. 3033–3051, 2006.
[99]Pérez-Cruz, F., Rodrigues, M. R. D., and Verdú, S., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 56, no. 3, pp. 1070–1084, 2010.
[100]Rodrigues, M. R. D., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 60, no. 1, pp. 569–585, 2014.
[101]A. G. C. P. Ramos and Rodrigues, M. R. D., “Fading channels with arbitrary inputs: Asymptotics of the constrained capacity and information and estimation measures,” IEEE Trans. Information Theory, vol. 60, no. 9, pp. 5653–5672, 2014.
[102]Kay, S. M., Fundamentals of statistical signal processing: Detection theory. Prentice Hall, 1998.
[103]Feder, M. and Merhav, N., “Relations between entropy and error probability,” IEEE Trans. Information Theory, vol. 40, no. 1, pp. 259–266, 1994.
[104]Sason, I. and Verdú, S., “Arimoto–Rényi conditional entropy and Bayesian M-ary hypothesis testing,” IEEE Trans. Information Theory, vol. 64, no. 1, pp. 4–25, 2018.
[105]Polyanskiy, Y., Poor, H. V., and Verdú, S., “Channel coding rate in the finite blocklength regime,” IEEE Trans. Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
[106]Vazquez-Vilar, G., Campo, A. T., Guillén i Fàbregas, A., and Martinez, A., “Bayesian Mary hypothesis testing: The meta-converse and Verdú–Han bounds are tight,” IEEE Trans. Information Theory, vol. 62, no. 5, pp. 2324–2333, 2016.
[107]Venkataramanan, R. and Johnson, O., “A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation,” Electron. J. Statist, vol. 12, no. 1, pp. 1126–1149, 2018.
[108]Abbe, E., “Community detection and stochastic block models: Recent developments,” J. Machine Learning Res., vol. 18, pp. 1–86, 2018.
[109]Hajek, B., Wu, Y., and Xu, J., “Computational lower bounds for community detection on random graphs,” in Proc. 28th Conference on Learning Theory (COLT ’15), Paris, 2015, pp. 1–30.
[110]Vapnik, V. N., “An overview of statistical learning theory,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999.
[111]Bousquet, O. and Elisseeff, A., “Stability and generalization,” J. Machine Learning Res., vol. 2, pp. 499–526, 2002.
[112]Xu, H. and Mannor, S., “Robustness and generalization,” Machine Learning, vol. 86, no. 3, pp. 391–423, 2012.
[113]McAllester, D. A., “PAC-Bayesian stochastic model selection,” Machine Learning, vol. 51, pp. 5–21, 2003.
[114]Russo, D. and Zou, J., “How much does your data exploration overfit? Controlling bias via information usage,” arXiv:1511.05219, 2016.
[115]Xu, A. and Raginsky, M., “Information -theoretic analysis of generalization capability of learning algorithms,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’17), 2017.
[116]Raginsky, M., Rakhlin, A., Tsao, M., Wu, Y., and Xu, A., “Information -theoretic analysis of stability and bias of learning algorithms,” in Proc. IEEE Information Theory Workshop (ITW ’16), 2016.
[117]Bassily, R., Moran, S., Nachum, I., Shafer, J., and Yehudayof, A., “Learners that use little information,” arXiv:1710.05233, 2018.
[118]Asadi, A. R., Abbe, E., and Verdú, S., “Chaining mutual information and tightening generalization bounds,” arXiv:1806.03803, 2018.
[119]Pensia, A., Jog, V., and Loh, P. L., “Generalization error bounds for noisy, iterative algorithms,” arXiv:1801.04295v1, 2018.
[120]Zhang, J., Liu, T., and Tao, D., “An information-theoretic view for deep learning,” arXiv:1804.09060, 2018.
[121]Vera, M., Piantanida, P., and Vega, L. R., “The role of information complexity and randomization in representation learning,” arXiv:1802.05355, 2018.
[122]Vera, M., Vega, L. R., and Piantanida, P., “Compression -based regularization with an application to multi-task learning,arXiv:1711.07099, 2018.
[123]Chan, C., Al-Bashadsheh, A., and Zhou, Q., “Info-clustering: A mathematical theory of data clustering,” IEEE Trans. Mol. Biol. Multi-Scale Commun., vol. 2, no. 1, pp. 64–91, 2016.
[124]Raman, R. K. and Varshney, L. R., “Universal joint image clustering and registration using multivariate information measures,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 928–943, 2018.
[125]Zhang, Z. and Berger, T., “Estimation via compressed information,” IEEE Trans. Information Theory, vol. 34, no. 2, pp. 198–211, 1988.
[126]Han, T. S. and Amari, S., “Parameter estimation with multiterminal data compression,” IEEE Trans. Information Theory, vol. 41, no. 6, pp. 1802–1833, 1995.
[127]Zhang, Y., Duchi, J. C., Jordan, M. I., and Wainwright, M. J., “Information -theoretic lower bounds for distributed statistical estimation with communication constraints,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’13), 2013.
[128]Ahlswede, R. and Csiszár, I., “Hypothesis testing with communication constraints,” IEEE Trans. Information Theory, vol. 32, no. 4, pp. 533–542, 1986.
[129]Han, T. S., “Hypothesis testing with multiterminal data compression,” IEEE Trans. Information Theory, vol. 33, no. 6, pp. 759–772, 1987.
[130]Han, T. S. and Kobayashi, K., “Exponential-type error probabilities for multiterminal hypothesis testing,” IEEE Trans. Information Theory, vol. 35, no. 1, pp. 2–14, 1989.
[131]Han, T. S. and Amari, S., “Statistical inference under multiterminal data compression,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2300–2324, 1998.
[132]Shalaby, H. M. H. and Papamarcou, A., “Multiterminal detection with zero-rate data compression,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 254–267, 1992.
[133]Katz, G., Piantanida, P., Couillet, R., and Debbah, M., “On the necessity of binning for the distributed hypothesis testing problem,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.
[134]Xiang, Y. and Kim, Y., “Interactive hypothesis testing against independence,” in Proc. IEEE International Symposium on Information Theory (ISIT ’13), 2013.
[135]Zhao, W. and Lai, L., “Distributed testing against independence with conferencing encoders,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.
[136]Zhao, W. and Lai, L., “Distributed testing with zero-rate compression,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.
[137]Zhao, W. and Lai, L., “Distributed detection with vector quantizer,” IEEE Trans. Signal Information Processing Networks, vol. 2, no. 2, pp. 105–119, 2016.
[138]Zhao, W. and Lai, L., “Distributed testing with cascaded encoders,” IEEE Trans. Information Theory, vol. 64, no. 11, pp. 7339–7348, 2018.
[139]Raginsky, M., “Learning from compressed observations,” in Proc. IEEE Information Theory Workshop (ITW ’07), 2007.
[140]Raginsky, M., “Achievability results for statistical learning under communication constraints,” in Proc. IEEE International Symposium on Information Theory (ISIT ’09), 2009.
[141]Xu, A. and Raginsky, M., “Information-theoretic lower bounds for distributed function computation,” IEEE Trans. Information Theory, vol. 63, no. 4, pp. 2314–2337, 2017.
[142]Dwork, C. and Roth, A., “The algorithmic foundations of differential privacy,” Foundations and Trends Theoretical Computer Sci., vol. 9, no. 3–4, pp. 211–407, 2014.
[143]Liao, J., Sankar, L., Tan, V. Y. F., and Calmon, F. P., “Hypothesis testing under mutual information privacy constraints in the high privacy regime,” IEEE Trans. Information Forensics Security, vol. 13, no. 4, pp. 1058–1071, 2018.
[144]Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N., and Varshney, K. R., “Data pre-processing for discrimination prevention: Information-theoretic optimization and analysis,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 1106–1119, 2018.