Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-c4f8m Total loading time: 0 Render date: 2024-04-23T06:33:50.579Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  05 May 2022

Daniel A. Roberts
Affiliation:
Massachusetts Institute of Technology
Sho Yaida
Affiliation:
Meta AI
Boris Hanin
Affiliation:
Princeton University, New Jersey
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
The Principles of Deep Learning Theory
An Effective Theory Approach to Understanding Neural Networks
, pp. 439 - 444
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Dirac, P. A. M., The Principles of Quantum Mechanics. No. 27 in The International Series of Monographs on Physics. Oxford University Press, 1930.Google Scholar
von Neumann, J., Mathematical Foundations of Quantum Mechanics. Princeton University Press, 1955.Google Scholar
Carnot, S., Reflections on the Motive Power of Heat and on Machines Fitted to Develop that Power. Wiley, J., 1890. Trans. by Thurston, R. H. from Réflexions sur la puissance motrice du feu et sur les machines propres àdévelopper cette puissance (1824).CrossRefGoogle Scholar
Bessarab, M., Landau. M., Moscow worker, 1971. Trans. by B. Hanin from the original Russian source. www.ega-math.narod.ru/Landau/Dau1971.htm.Google Scholar
Polchinski, J., “Memories of a Theoretical Physicist,” arXiv:1708.09093 [physics.hist-ph].Google Scholar
Rosenblatt, F., Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanism, tech. rep., Cornell Aeronautical Lab, Inc., 1961.Google Scholar
McCulloch, W. S. and Pitts, W., “A logical calculus of the ideas immanent in nervous activity,” The Bulletin of Mathematical Biophysics 5 no. 4, (1943) 115133.CrossRefGoogle Scholar
Rosenblatt, F., “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological Review 65 no. 6, (1958) 386.Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E., “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing System 25, (2012) 10971105.Google Scholar
Fukushima, K., “Neocognitron: A self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics 36 no. 4, (1980) 193202.Google Scholar
LeCun, Y., Generalization and Network Design Strategies, tech. rep. CRG-TR-89-4, Department of Computer Science, University of Toronto, 1989.Google Scholar
LeCun, Y., Boser, B., Denker, J. S., et al., “Backpropagation applied to handwritten zip code recognition,” Neural Computation 1 no. 4, (1989) 541551.CrossRefGoogle Scholar
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE 86 no. 11, (1998) 22782324.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al., “Attention is all you need,” in Advances in Neural Information Processing Systems 30, (2017) 5998–6008. arXiv:1706.03762 [cs.CL].Google Scholar
Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, Rumelhart, D. E., McClelland, J. L., and the PDP Research Group, eds., Ch. 8, MIT Press, 1986, pp. 318362.Google Scholar
LeCun, Y. A., Bottou, L., Orr, G. B., and Müller, K.-R., “Efficient backprop,” in Neural Networks: Tricks of the Trade, pp. 948. Springer, 1998.Google Scholar
Gallant and White, “There exists a neural network that does not make avoidable mistakes,” in IEEE 1988 International Conference on Neural Networks, pp. 657–664 vol.1. 1988.Google Scholar
Nair, V. and Hinton, G. E., “Rectified linear units improve restricted Boltzmann machines,” in International Conference on Machine Learning. 2010.Google Scholar
Glorot, X., Bordes, A., and Bengio, Y., “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323, JMLR Workshop and Conference Proceedings. 2011.Google Scholar
Maas, A. L., Hannun, A. Y., and Ng, A. Y., “Rectifier nonlinearities improve neural network acoustic models,” in ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013.Google Scholar
Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C., and Garcia, R., “Incorporating second-order functional knowledge for better option pricing,” in Advances in Neural Information Processing Systems 13, (2000) 472478.Google Scholar
Ramachandran, P., Zoph, B., and Le, Q. V., “Searching for activation functions,” arXiv:1710.05941 [cs.NE].Google Scholar
Hendrycks, D. and Gimpel, K., “Gaussian error linear units (GELUs),” arXiv:1606.08415 [cs.LG].Google Scholar
Turing, A. M., “The chemical basis of morphogenesis,” Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 237 no. 641, (1952) 3772.Google Scholar
Saxe, A. M., McClelland, J. L., and Ganguli, S., “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks,” arXiv:1312.6120 [cs.NE].Google Scholar
Zavatone-Veth, J. A. and Pehlevan, C., “Exact priors of finite neural networks,” arXiv:2104.11734 [cs.LG].Google Scholar
McGreevy, J., “Holographic duality with a view toward many-body physics,” Advances in High Energy Physics 2010 (2010) 723105, arXiv:0909.0518 [hep-th].Google Scholar
Neal, R. M., “Priors for infinite networks,” in Bayesian Learning for Neural Networks, pp. 2953. Springer, 1996.Google Scholar
Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pennington, J., and Sohl-Dickstein, J., “Deep neural networks as Gaussian processes,” in International Conference on Learning Representations. 2018. arXiv:1711.00165 [stat.ML].Google Scholar
de, A. G. Matthews, G., Rowland, M., Hron, J., Turner, R. E., and Ghahramani, Z., “Gaussian process behaviour in wide deep neural networks,” in International Conference on Learning Representations. 2018. arXiv:1804.11271 [stat.ML].Google Scholar
Yaida, S., “Non-Gaussian processes and neural networks at finite widths,” in Mathematical and Scientific Machine Learning Conference. 2020. arXiv:1910.00019 [stat.ML].Google Scholar
Dyson, F. J., “The S matrix in quantum electrodynamics,” Physical Review 75 (Jun, 1949) 17361755.CrossRefGoogle Scholar
Schwinger, J., “On the Green’s functions of quantized fields. I,” Proceedings of the National Academy of Sciences 37 no. 7, (1951) 452455.Google Scholar
Zeiler, M. D. and Fergus, R., “Visualizing and understanding convolutional networks,” in Computer Vision – ECCV 2014, (2014) 818833.Google Scholar
Rahimi, A. and Recht, B., “Random features for large-scale kernel machines,” in Advances in Neural Information Processing Systems 20, (2008) 11771184.Google Scholar
Devlin, J., Chang, M., Lee, K., and Toutanova, K., “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805 [cs.CL].Google Scholar
Gell-Mann, M. and Low, F. E., “Quantum electrodynamics at small distances,” Physical Review 95 (Sep, 1954) 13001312.Google Scholar
Wilson, K. G., “Renormalization group and critical phenomena. I. Renormalization group and the Kadanoff scaling picture,” Physical Review B 4 (Nov, 1971) 31743183.Google Scholar
Wilson, K. G., “Renormalization group and critical phenomena. II. Phase-space cell analysis of critical behavior,” Physical Review B 4 (Nov, 1971) 31843205.Google Scholar
Stueckelberg de Breidenbach, E. C. G. and Petermann, A., “Normalization of constants in the quanta theory,” Helvetica Physica Acta 26 (1953) 499520.Google Scholar
Goldenfeld, N., Lectures on Phase Transitions and the Renormalization Group. CRC Press, 2018.CrossRefGoogle Scholar
Cardy, J., Scaling and Renormalization in Statistical Physics. Cambridge Lecture Notes in Physics. Cambridge University Press, 1996.Google Scholar
Minsky, M. and Papert, S. A., Perceptrons: An Introduction to Computational Geometry. MIT Press, 1988.Google Scholar
Coleman, S., Aspects of Symmetry: Selected Erice Lectures. Cambridge University Press, 1985.Google Scholar
Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., and Ganguli, S., “Exponential expressivity in deep neural networks through transient chaos,” in Advances in Neural Information Processing Systems 29, (2016) 33603368. arXiv:1606.05340 [stat.ML].Google Scholar
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., and Sohl-Dickstein, J., “On the expressive power of deep neural networks,” in International Conference on Machine Learning, pp. 28472854. 2017. arXiv:1606.05336 [stat.ML].Google Scholar
Schoenholz, S. S., Gilmer, J., Ganguli, S., and Sohl-Dickstein, J., “Deep information propagation,” in 5th International Conference on Learning Representations. 2017. arXiv:1611.01232 [stat.ML].Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J., “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 10261034. 2015. arXiv:1502.01852 [cs.CV].Google Scholar
Kadanoff, L., “Critical behavior. Universality and scaling,” in Proceedings of the International School of Physics Enrico Fermi, Course LI (27 July - 8 August 1970). 1971.Google Scholar
Jaynes, E. T., Probability Theory: The Logic of Science. Cambridge University Press, 2003.Google Scholar
Froidmont, L., On Christian Philosophy of the Soul. 1649.Google Scholar
MacKay, D. J., “Probable networks and plausible predictions–a review of practical Bayesian methods for supervised neural networks,” Network: Computation in Neural Systems 6 no. 3, (1995) 469505.Google Scholar
Williams, C. K. I., “Computing with infinite networks,” in Advances in Neural Information Processing Systems 9, (1996) 295301.Google Scholar
Hinton, G., Vinyals, O., and Dean, J., “Distilling the knowledge in a neural network,” in NIPS Deep Learning and Representation Learning Workshop. 2015. arXiv:1503.02531 [stat.ML].Google Scholar
Hebb, D., The Organization of Behavior: A Neuropsychological Theory. Taylor & Francis, 2005.CrossRefGoogle Scholar
Coleman, S., “Sidney Coleman’s Dirac lecture ‘quantum mechanics in your face’,” arXiv:2011.12671 [physics.hist-ph].Google Scholar
Jacot, A., Gabriel, F., and Hongler, C., “Neural tangent kernel: Convergence and generalization in neural networks,” in Advances in Neural Information Processing Systems 31, (2018) 85718580. arXiv:1806.07572 [cs.LG].Google Scholar
Hochreiter, S., Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische Universität München, 1991.Google Scholar
Bengio, Y., Frasconi, P., and Simard, P., “The problem of learning long-term dependencies in recurrent networks,” in IEEE International Conference on Neural Networks, vol. 3, pp. 11831188, IEEE. 1993.Google Scholar
Pascanu, R., Mikolov, T., and Bengio, Y., “On the difficulty of training recurrent neural networks,” in International Conference on Machine Learning, pp. 13101318, PMLR. 2013. arXiv:1211.5063 [cs.LG].Google Scholar
Kline, M., Mathematical Thought From Ancient to Modern Times: Volume 3. Oxford University Press, 1990.Google Scholar
Lee, J., Xiao, L., Schoenholz, S., et al., “Wide neural networks of any depth evolve as linear models under gradient descent,” in Advances in Neural Information Processing Systems 32, (2019) 85728583. arXiv:1902.06720 [stat.ML].Google Scholar
Fix, E. and Hodges, J., “Discriminatory analysis. Nonparametric discrimination: Consistency properties,” USAF School of Aviation Medicine, Project Number: 21-49-004, Report Number: 4 (1951) .Google Scholar
Cover, T. and Hart, P., “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory 13 (1967) 2127.Google Scholar
Einstein, A., “On the method of theoretical physics,” Philosophy of Science 1 no. 2, (1934) 163169.Google Scholar
Hanin, B. and Nica, M., “Finite depth and width corrections to the neural tangent kernel,” in International Conference on Learning Representations. 2020. arXiv:1909.05989 [cs.LG].Google Scholar
Dyer, E. and Gur-Ari, G., “Asymptotics of wide networks from Feynman diagrams,” in International Conference on Learning Representations. 2020. arXiv:1909.11304 [cs.LG].Google Scholar
Giudice, G. F., “Naturally speaking: The naturalness criterion and physics at the LHC,” arXiv:0801.2562 [hep-ph].Google Scholar
Dyson, F. J, “Forward,” in Classic Feynman: All the Adventures of a Curious Character, Leighton, R., ed., W. W. Norton & Company Ltd., 2006, pp. 59.Google Scholar
Chizat, L., Oyallon, E., and Bach, F., “On lazy training in differentiable programming,” in Advances in Neural Information Processing Systems 32, (2019) 29372947. arXiv:1812. 07956 [math.OC].Google Scholar
MacKay, D. J., Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003.Google Scholar
Kaplan, J., McCandlish, S., Henighan, T., et al., “Scaling laws for neural language models,” arXiv:2001.08361 [cs.LG].Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O., “Understanding deep learning requires rethinking generalization,” arXiv:1611.03530 [cs.LG].Google Scholar
Wolpert, D. H., “The lack of a priori distinctions between learning algorithms,” Neural Computation 8 no. 7, (1996) 13411390.Google Scholar
Wolpert, D. H. and Macready, W. G., “No free lunch theorems for optimization,” IEEE Transactions on Evolutionary Computation 1 no. 1, (1997) 6782.Google Scholar
Boltzmann, L., “On certain questions of the theory of gases,” Nature 51 no. 1322, (1895) 413415.Google Scholar
Boltzmann, L., Lectures on Gas Theory. Berkeley, University of California Press, 1964. Trans. by S. G. Brush from Vorlesungen ueber Gastheorie (2 vols., 1896 & 1898).Google Scholar
Shannon, C. E., “A mathematical theory of communication,” The Bell System Technical Journal 27 no. 3, (1948) 379423.Google Scholar
Shannon, C. E., “A mathematical theory of communication,” The Bell System Technical Journal 27 no. 4, (1948) 623656.Google Scholar
Jaynes, E. T., “Information theory and statistical mechanics,” Physical Review 106 (May, 1957) 620630.Google Scholar
Jaynes, E. T., “Information theory and statistical mechanics. II,” Physical Review 108 (Oct, 1957) 171190.Google Scholar
LeCun, Y., Denker, J., and Solla, S., “Optimal brain damage,” in Advances in Neural Information Processing Systems 2, (1990) 598605.Google Scholar
Frankle, J. and Carbin, M., “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in International Conference on Learning Representations. 2019. arXiv:1803.03635 [cs.LG].Google Scholar
Banks, T. and Zaks, A., “On the phase structure of vector-like gauge theories with massless fermions,” Nuclear Physics B 196 no. 2, (1982) 189204.Google Scholar
Linsker, R., “Self-organization in a perceptual network,” Computer 21 no. 3, (1988) 105117.Google Scholar
Becker, S. and Hinton, G. E., “Self-organizing neural network that discovers surfaces in random-dot stereograms,” Nature 355 no. 6356, (1992) 161163.Google Scholar
Gale, B., Zemeckis, R., Fox, M. J., and Lloyd, C., Back to the Future Part II. Universal Pictures, 1989.Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. 2016.CrossRefGoogle Scholar
Ioffe, S. and Szegedy, C., “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, pp. 448456. 2015. arXiv:1502.03167 [cs.LG].Google Scholar
Veit, A., Wilber, M., and Belongie, S., “Residual networks behave like ensembles of relatively shallow networks,” in Advances in Neural Information Processing Systems 30, (2016) 550558. arXiv:1605.06431 [cs.CV].Google Scholar
Ba, J. L., Kiros, J. R., and Hinton, G. E., “Layer normalization,” in Deep Learning Symposium, Neural Information Processing Systems. 2016. arXiv:1607.06450 [stat.ML].Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×