Hostname: page-component-8448b6f56d-xtgtn Total loading time: 0 Render date: 2024-04-19T03:19:23.042Z Has data issue: false hasContentIssue false

Application of a Generalization of Russo's Formula to Learning from Multiple Random Oracles

Published online by Cambridge University Press:  09 July 2009

JAN ARPE
Affiliation:
Department of Statistics, UC Berkeley, CA 94720, USA and Bertelsmann Stiftung, Carl-Bertelsmann-Strasse 256, 33311 Gütersloh, Germany (e-mail: jan.arpe@bertelsmann.de)
ELCHANAN MOSSEL
Affiliation:
Departments of Statistics and Computer Science, UC Berkeley, CA 94720, USA and Department of Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel (e-mail: mossel@stat.berkeley.edu)

Abstract

We study the problem of learning k-juntas given access to examples drawn from a number of different product distributions. Thus we wish to learn a function f: {−1, 1}n → {−1, 1} that depends on k (unknown) coordinates. While the best-known algorithms for the general problem of learning a k-junta require running times of nk poly(n, 2k), we show that, given access to k different product distributions with biases separated by γ > 0, the functions may be learned in time poly(n, 2k, γk). More generally, given access to tk different product distributions, the functions may be learned in time nk/tpoly(n, 2k, γk). Our techniques involve novel results in Fourier analysis, relating Fourier expansions with respect to different biases, and a generalization of Russo's formula.

Type
Paper
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Almuallim, H. and Dietterich, T. G. (1994) Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69 279305.Google Scholar
[2]Arpe, J. and Reischuk, R. (2007) Learning juntas in the presence of noise. Theoret. Comput. Sci. 384 221.Google Scholar
[3]Atıcı, A. and Servedio, R. A. (2007) Quantum algorithms for learning and testing juntas. Quantum Inf. Process. 6 323348.Google Scholar
[4]Blum, A. (2003) Learning a function of r relevant variables. In Computational Learning Theory and Kernel Machines: COLT/Kernel 2003 (Schölkopf, B. and Warmuth, M K., eds), Vol. 2777 of Lecture Notes in Computer Science, Springer, pp. 731733.Google Scholar
[5]Blum, A., Furst, M., Jackson, J. C., Kearns, M., Mansour, Y. and Rudich, S. (1994) Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proc. 26th Annual ACM Symposium on Theory of Computing: STOC '94 (Montreal), ACM Press, pp. 253262.Google Scholar
[6]Blum, A. and Langley, P. (1997) Selection of relevant features and examples in machine learning. Artificial Intelligence 97 245271.Google Scholar
[7]Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989) Learnability and the Vapnik–Chervonenkis dimension. J. Assoc. Comput. Mach. 36 929965.Google Scholar
[8]Bshouty, N. H. and Feldman, V. (2002) On using extended statistical queries to avoid membership queries. J. Mach. Learn. Res. 2 359396.Google Scholar
[9]Bshouty, N. H. and Tamon, C. (1996) On the Fourier spectrum of monotone functions. J. Assoc. Comput. Mach. 43 747770.Google Scholar
[10]Crammer, K., Kearns, M. and Wortman, J. (2005) Learning from data of variable quality. In Advances in Neural Information Processing Systems 18: NIPS 2005 (Vancouver), MIT Press.Google Scholar
[11]Crammer, K., Kearns, M. and Wortman, J. (2006) Learning from multiple sources. In Advances in Neural Information Processing Systems 19: NIPS '06 (Vancouver), MIT Press, pp. 321328.Google Scholar
[12]Dooly, D. R., Zhang, Q., Goldman, S. A. and Amar, R. A. (2002) Multiple-instance learning of real-valued data. J. Mach. Learn. Res. 3 651678.Google Scholar
[13]Feldman, J., O'Donnell, R. and Servedio, R. A. (2005) Learning mixtures of product distributions over discrete domains. In 46th Annual IEEE Symposium on Foundations of Computer Science: FOCS '05, IEEE Press, pp. 501510.Google Scholar
[14]Furst, M. L., Jackson, J. C. and Smith, S. W. (1991) Improved learning of AC 0 functions. In Proc. 4th Annual Workshop on Computational Learning Theory: COLT 1991 (Valiant, L. G. and Warmuth, M. K., eds), Morgan Kaufmann, pp. 317325.Google Scholar
[15]Grimmett, G. (1999) Percolation, 2nd edn, Grundlehren der mathematischen Wissenschaften, Springer,Google Scholar
[16]Hancock, T. R. and Mansour, Y. (1991) Learning monotone k-μ DNF formulas on product distributions. In Proc. 4th Annual Workshop on Computational Learning Theory: COLT 1991 (Valiant, L. G. and Warmuth, M. K., eds), Morgan Kaufmann, pp. 179183.Google Scholar
[17]Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 1330.Google Scholar
[18]Kearns, M. (1998) Efficient noise-tolerant learning from statistical queries. J. Assoc. Comput. Mach. 45 9831006.Google Scholar
[19]Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R. E. and Sellie, L. (1994) On the learnability of discrete distributions. In Proc. 26th Annual ACM Symposium on Theory of Computing: STOC '94 (Montreal), ACM Press, pp. 273282.Google Scholar
[20]Kolountzakis, M. N., Markakis, E. and Mehta, A. (2005) Learning symmetric juntas in time no(k). In Workshop on Interface between Harmonic Analysis and Number Theory (Marseille 2005). Available as technical report at http://arxiv.org/abs/math.CO/0504246v1.Google Scholar
[21]Ling, C. X. and Yang, Q. (2006) Discovering classification from data of multiple sources. Data Min. Knowl. Discov. 12 181201.Google Scholar
[22]Lipton, R. J., Markakis, E., Mehta, A. and Vishnoi, N. K. (2005) On the Fourier spectrum of symmetric Boolean functions with applications to learning symmetric juntas. In 20th Annual IEEE Conference on Computational Complexity: CCC '05, IEEE Computer Society Press, pp. 112119.Google Scholar
[23]Mossel, E., O'Donnell, R. W. and Servedio, R. A. (2004) Learning functions of k relevant variables. J. Comput. System Sci. 69 421434.Google Scholar
[24]Russo, L. (1981) On the critical percolation probabilities. Z. Wahrscheinlichkeitstheorie verw. Gebiete 56 229237.Google Scholar
[25]Servedio, R. A. (2004) On learning monotone DNF under product distributions. Inform. Comput. 193 5774.Google Scholar
[26]Turán, G. (1993) Lower bounds for PAC learning with queries. In Proc. 6th Annual ACM Conference on Computational Learning Theory: COLT 1993 (Valiant, L. G. and Warmuth, M. K., eds), ACM Press, pp. 384391.Google Scholar
[27]Valiant, L. G. (1984) A theory of the learnable. Commun. Assoc. Comput. Mach. 27 11341142.Google Scholar
[28]Vempala, S. and Wang, G. (2002) A spectral algorithm for learning mixtures of distributions. In 43rd Symposium on Foundations of Computer Science: FOCS 2002 (Vancouver), IEEE Press, p. 113.Google Scholar