Skip to main content Accessibility help
×
Hostname: page-component-7479d7b7d-m9pkr Total loading time: 0 Render date: 2024-07-16T00:45:33.557Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  01 June 2022

Philipp Hennig
Affiliation:
Eberhard-Karls-Universität Tübingen, Germany
Michael A. Osborne
Affiliation:
University of Oxford
Hans P. Kersting
Affiliation:
Ecole Normale Supérieure, Paris
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Probabilistic Numerics
Computation as Machine Learning
, pp. 369 - 394
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdulle, A. and Garegnani, G.. “Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration”. Statistics and Computing 30.4 (2020), pp. 907932.Google Scholar
Abdulle, A. and Garegnani, G.. “A probabilistic finite element method based on random meshes: A posteriori error estimators and Bayesian inverse problems”. Computer Methods in Applied Mechanics and Engineering 384 (2021), p. 113961.Google Scholar
Adler, R. The Geometry of Random Fields. Wiley, 1981.Google Scholar
Adler, R An introduction to continuity, extrema, and related topics for general Gaussian processes. Vol. 12. Lecture Notes-Monograph Series. Institute of Mathematical Statistics, 1990.Google Scholar
Ajne, B. and Dalenius, T.. “Några Tillämpningar av statistika idéer på numerisk integration”. Nordisk Mathematisk Tidskrift 8.4 (1960), pp. 145152.Google Scholar
Akhiezer, N. I. and Glazman, I. M.. Theory of linear operators in Hilbert space. Vol. I& II. Courier Corporation, 2013.Google Scholar
Alizadeh, F., Haeberley, J.-P. A., and Overton, M. L.. “Primal-Dual interior-point methods for semidefinite programming: Convergence rates, stability and numerical results”. SIAM Journal on Optimization (1988), pp. 746768.Google Scholar
Anderson, B. and Moore, J.. Optimal Filtering. Prentice-Hall, 1979.Google Scholar
Anderson, E. et al. LAPACK Users’ Guide. 3rd edition. Society for Industrial and Applied Mathematics (SIAM), 1999.Google Scholar
Arcangéli, R., López de Silanes, M. C., and Torrens, J. J.. “An extension of a bound for functions in Sobolev spaces, with applications to (m, s)-spline interpolation and smoothing”. Numerische Mathematik 107.2 (2007), pp. 181211.Google Scholar
Armijo, L. “Minimization of functions having Lipschitz continuous first partial derivatives”. Pacific Journal of Mathematics (1966), pp. 13.Google Scholar
Arnold, V. I. Ordinary Differential Equations. Universitext. Springer, 1992.Google Scholar
Arnoldi, W. “The principle of minimized iterations in the solution of the matrix eigenvalue problem”. Quarterly of Applied Mathematics 9.1 (1951), pp. 1729.Google Scholar
Aronszajn, N. “Theory of reproducing kernels”. Transactions of the AMS (1950), pp. 337404.Google Scholar
Arvanitidis, G. et al. “Fast and Robust Shortest Paths on Manifolds Learned from Data”. The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 89. Proceedings of Machine Learning Research. PMLR, 2019, pp. 15061515.Google Scholar
Azimi, J., Fern, A., and Fern, X. Z.. “Batch Bayesian Optimization via Simulation Matching”. Advances in Neural Information Processing Systems, NeurIPS. Curran Associates, Inc., 2010, pp. 109117.Google Scholar
Azimi, J., Jalali, A., and Fern, X. Z.. “Hybrid Batch Bayesian Optimization”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar
Bach, F. “On the equivalence between kernel quadrature rules and random feature expansions”. Journal of Machine Learning Research (JMLR) 18.21 (2017), pp. 138.Google Scholar
Bach, F., Lacoste-Julien, S., and Obozinski, G.. “On the Equivalence between Herding and Conditional Gradient Algorithms”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar
Baker, C. The numerical treatment of integral equations. Oxford: Clarendon Press, 1973.Google Scholar
Balles, L. and Hennig, P.. “Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients”. Proceedings of the 35th International Conference on Machine Learning, ICML. Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 413422.Google Scholar
Balles, L., Romero, J., and Hennig, P.. “Coupling Adaptive Batch Sizes with Learning Rates”. Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2017.Google Scholar
Bapat, R. Nonnegative Matrices and Applications. Cambridge University Press, 1997.Google Scholar
Barber, D. Bayesian reasoning and machine learning. Cambridge University Press, 2012.Google Scholar
Bardenet, R. and Hardy, A.. “Monte Carlo with Determinantal Point Processes”. Annals of Applied Probability (2019).Google Scholar
Bartels, S. et al. “Probabilistic linear solvers: a unifying view”. Statistics and Computing 29.6 (2019), pp. 12491263.Google Scholar
Belhadji, A., Bardenet, R., and Chainais, P.. “Kernel quadrature with DPPs”. Advances in Neural Information Processing Systems, NeurIPS. 2019, pp. 1290712917.Google Scholar
Bell, B. M. “The Iterated Kalman Smoother as a Gauss—Newton Method”. SIAM Journal on Optimization 4.3 (1994), pp. 626636.Google Scholar
Bell, B. M. and Cathey, F. W.. “The iterated Kalman filter update as a Gauss– Newton method”. IEEE Transaction on Automatic Control 38.2 (1993), pp. 294297.Google Scholar
Benoit, . “Note sûre une méthode de résolution des équations normales provenant de l’application de la méthode des moindres carrés a un système d’équations linéaires en nombre inférieure a celui des inconnues. Application de la méthode a la résolution d’un système défini d’équations linéaires. (Procédé du Commandant Cholesky)”. Bulletin Geodesique (1924), pp. 6777.Google Scholar
Berg, C., Christensen, J., and Ressel, P.. Harmonic Analysis on Semigroups — Theory of Positive Definite and Related Functions. Springer, 1984.Google Scholar
Bergstra, J. et al. “Algorithms for Hyper-Parameter Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2011, pp. 25462554.Google Scholar
Bertsekas, D. Nonlinear programming. Athena Scientific, 1999.Google Scholar
Bettencourt, J., Johnson, M., and Duvenaud, D.. “Taylor-mode automatic differentiation for higher-order derivatives”. NeurIPS 2019 Workshop Program Transformations. 2019.Google Scholar
Bini, D., Iannazzo, B., and Meini, B.. Numerical solution of algebraic Riccati equations. SIAM, 2011.Google Scholar
Bishop, C. Pattern Recognition and Machine Learning. Springer, 2006.Google Scholar
Björck, Å. Numerical Methods in Matrix Computations. Springer, 2015.Google Scholar
Blight, B. J. N. and Ott, L.. “A Bayesian Approach to Model Inadequacy for Polynomial Regression”. Biometrika 62.1 (1975), pp. 7988.Google Scholar
Borodin, A. N. and Salminen, P.. Handbook of Brownian Motion - Facts and Formulae. 2nd edition. Probability and Its Applications. Birkhäuser Basel, 2002.Google Scholar
Bosch, N., Hennig, P., and Tronarp, F.. “Calibrated adaptive probabilistic ODE solvers”. Artificial Intelligence and Statistics (AISTATS). 2021, pp. 34663474.Google Scholar
Bosch, N., Tronarp, F., and Hennig, P.. “Pick-and-Mix Information Operators for Probabilistic ODE Solvers”. Artificial Intelligence and Statistics (AISTATS). 2022.Google Scholar
Bottou, L., Curtis, F. E., and Nocedal, J.. “Optimization Methods for Large-Scale Machine Learning”. arXiv:1606.04838 [stat.ML] (2016).Google Scholar
Bougerol, P. “Kalman filtering with random coefficients and contractions”. SIAM Journal on Control and Optimization 31.4 (1993), pp. 942959.Google Scholar
Boyd, S. and Vandenberghe, L.. Convex Optimization. Cambridge University Press, 2004.Google Scholar
Bretthorst, G. Bayesian Spectrum Analysis and Parameter Estimation. Vol. 48. Lecture Notes in Statistics. Springer, 1988.Google Scholar
Briol, F. et al. “Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 11621170.Google Scholar
Briol, F.-X. et al. “Probabilistic Integration: A Role in Statistical Computation?” Statistical Science 34.1 (2019), pp. 122.Google Scholar
Broyden, C. “A new double-rank minimization algorithm”. Notices of the AMS 16 (1969), p. 670.Google Scholar
Butcher, J. C. Numerical Methods for Ordinary Differential Equations. 3rd edition. John Wiley & Sons, 2016.Google Scholar
Calandra, R. et al. “Bayesian Gait Optimization for Bipedal Locomotion”. Learning and Intelligent OptimizatioN (LION8). 2014a, pp. 274290.Google Scholar
Calandra, R. et al. “An experimental comparison of Bayesian optimization for bipedal locomotion”. Proceedings of the International Conference on Robotics and Automation (ICRA). 2014b.Google Scholar
Cashore, J. M., Kumarga, L., and Frazier, P. I.. “Multi-Step Bayesian Optimization for One-Dimensional Feasibility Determination” (2015).Google Scholar
Chai, H. R. and Garnett, R.. “Improving Quadrature for Constrained Integrands”. The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 89. Proceedings of Machine Learning Research. PMLR, 2019, pp. 27512759.Google Scholar
Chen, R. T. Q. et al. “Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering”. Proceedings on “I Can’t Believe It’s Not Better!” at NeurIPS Workshops. Vol. 137. Proceedings of Machine Learning Research. PMLR, 2020, pp. 6069.Google Scholar
Chen, Y., Welling, M., and Smola, A. J.. “Super-Samples from Kernel Herding”. Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2010, pp. 109116.Google Scholar
Chen, Y. et al. “Bayesian optimization in AlphaGo”. arXiv:1812.06855 [cs.LG] (2018).Google Scholar
Chevalier, C. and Ginsbourger, D.. “Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection”. Learning and Intelligent Optimization. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, pp. 5969.Google Scholar
Chkrebtii, O. A. and Campbell, D.. “Adaptive step-size selection for state-space probabilistic differential equation solvers”. Statistics and Computing 29 (2019), pp. 12851295.Google Scholar
Chkrebtii, O. A. et al. “Bayesian solution uncertainty quantification for differential equations”. Bayesian Anal. 11.4 (2016), pp. 12391267.Google Scholar
Church, A. “On the concept of a random sequence”. Bulletin of the AMS 46.2 (1940), pp. 130135.Google Scholar
Cockayne, J. et al. “Probabilistic Numerical Methods for Partial Differential Equations and Bayesian Inverse Problems”. arXiv:1605.07811v3 [stat.ME] (2017a).Google Scholar
Cockayne, J. et al. “A Bayesian conjugate gradient method (with discussion)”. Bayesian Analysis 14.3 (2019a), pp. 9371012.Google Scholar
Cockayne, J. et al. “Bayesian Probabilistic Numerical Methods”. SIAM Review 61.4 (2019b), pp. 756789.Google Scholar
Cockayne, J. et al. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems”. AIP Conference Proceedings 1853.1 (2017b), p. 060001.Google Scholar
Colas, C., Sigaud, O., and Oudeyer, P.-Y.. “How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments”. arXiv:1806.08295 [cs.LG] (2018).Google Scholar
Conrad, P. R. et al. “Statistical analysis of differential equations: introducing probability measures on numerical solutions”. Statistics and Computing 27.4 (2017), pp. 10651082.Google Scholar
Cottle, R. “Manifestations of the Schur complement”. Linear Algebra Applications 8 (1974), pp. 189211.Google Scholar
Cox, R. “Probability, frequency and reasonable expectation”. American Journal of Physics 14.1 (1946), pp. 113.Google Scholar
Cranmer, K., Brehmer, J., and Louppe, G.. “The frontier of simulation-based inference”. Proceedings of the National Academy of Sciences (2020).Google Scholar
Cunningham, J., Hennig, P., and Lacoste-Julien, S.. “Gaussian Probabilities and Expectation Propagation”. arXiv:1111.6832 [stat.ML] (2011).Google Scholar
Dahlquist, G. G. “A special stability problem for linear multistep methods”. BIT Numerical Mathematics 3 (1963), pp. 2743.Google Scholar
Dangel, F., Kunstner, F., and Hennig, P.. “BackPACK: Packing more into Backprop”. 8th International Conference on Learning Representations, ICLR. 2020.Google Scholar
Dashti, M. and Stuart, A. M.. “The Bayesian Approach to Inverse Problems”. Handbook of Uncertainty Quantification. Springer International Publishing, 2017, pp. 311428.Google Scholar
Davidon, W. Variable metric method for minimization. Tech. rep. Argonne National Laboratories, Ill., 1959.Google Scholar
Davis, P. “Leonhard Euler’s Integral: A Historical Profile of the Gamma Function.” American Mathematical Monthly 66.10 (1959), pp. 849869.Google Scholar
Davis, P. and Rabinowitz, P.. Methods of Numerical Integration. 2nd edition. Academic Press, 1984.Google Scholar
Dawid, A. “Some matrix-variate distribution theory: Notational considerations and a Bayesian application”. Biometrika 68.1 (1981), pp. 265274.Google Scholar
Daxberger, E. and Low, B.. “Distributed Batch Gaussian Process Optimization”. PMLR. 2017, pp. 951960.Google Scholar
Daxberger, E. et al. “Laplace Redux-Effortless Bayesian Deep Learning”. Advances in Neural Information Processing Systems, NeurIPS. Vol. 34. 2021.Google Scholar
Demmel, J. W. Applied Numerical Linear Algebra. SIAM, 1997.Google Scholar
Dennis, J. “On some methods based on Broyden’s secant approximations”. Numerical Methods for Non-Linear Optimization. 1971.Google Scholar
Dennis, J. E. and J. J., Moré. “Quasi-Newton methods, motivation and theory”. SIAM Review 19.1 (1977), pp. 4689.Google Scholar
Desautels, T., Krause, A., and Burdick, J. W.. “Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar
Deuflhard, P. and Bornemann, F.. Scientific Computing with Ordinary Differential Equations. Vol. 42. Springer Texts in Applied Mathematics. Springer, 2002.Google Scholar
Diaconis, P. “Bayesian numerical analysis”. Statistical decision theory and related topics IV (1988), pp. 163175.Google Scholar
Diaconis, P. and Freedman, D.. “Finite exchangeable sequences”. The Annals of Probability (1980), pp. 745764.Google Scholar
Diaconis, P. and Ylvisaker, D.. “Conjugate priors for exponential families”. The Annals of Statistics 7.2 (1979), pp. 269281.Google Scholar
Dick, J., Kuo, F. Y., and Sloan, I. H.. “High-dimensional integration: the quasi-Monte Carlo way”. Acta Numerica 22 (2013), pp. 133288.Google Scholar
Dixon, L. “Quasi-Newton algorithms generate identical points”. Mathematical Programming 2.1 (1972a), pp. 383387.Google Scholar
Dixon, L. “Quasi Newton techniques generate identical points II: The proofs of four new theorems”. Mathematical Programming 3.1 (1972b), pp. 345358.Google Scholar
Dormand, J. and Prince, P.. “A family of embedded Runge-Kutta formulae”. Journal of computational and applied mathematics (1980), pp. 1926.Google Scholar
Doucet, A., de Freitas, N., and Gordon, N.. “An Introduction to Sequential Monte Carlo Methods”. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer, New York, NY, 2001, pp. 314.Google Scholar
Driscoll, M. “The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process”. Probability Theory and Related Fields 26.4 (1973), pp. 309316.Google Scholar
Eich-Soellner, E. and Führer, C.. “Implicit Ordinary Differential Equations”. Numerical Methods in Multibody Dynamics. Vieweg+Teubner Verlag, 1998, pp. 139192.Google Scholar
Einstein, A. “Zur Theorie der Brownschen Bewegung”. Annalen der Physik (1906), pp. 371381.Google Scholar
Faßbender, H. Symplectic methods for the symplectic eigenproblem. Springer Science & Business Media, 2007.Google Scholar
Fillion, N. and Corless, R. M.. “On the epistemological analysis of modeling and computational error in the mathematical sciences”. Synthese 191.7 (2014), pp. 14511467.Google Scholar
Fletcher, R. “A new approach to variable metric algorithms”. The Computer Journal 13.3 (1970), p. 317.Google Scholar
Fletcher, R. “Conjugate Gradients Methods for Indefinite Systems”. Dundee Biennial Conference on Numerical Analysis. 1975, pp. 7389.Google Scholar
Fletcher, R. and Reeves, C.. “Function minimization by conjugate gradients”. The Computer Journal (1964), pp. 149154.Google Scholar
Fowler, D. and Robson, E.. “Square root approximations in Old Babylonian mathematics: YBC 7289 in context”. Historia Mathematica 25.4 (1998), pp. 366378.Google Scholar
Frazier, P., Powell, W., and Dayanik, S.. “The Knowledge-Gradient Policy for Correlated Normal Beliefs”. INFORMS Journal on Computing 21.4 (2009), pp. 599613.Google Scholar
Fredholm, E. I. “Sur une classe d’équations fonctionnelles”. Acta Mathematica 27 (1903), pp. 365390.Google Scholar
Freitas, N. de, A. J., Smola, and Zoghi, M.. “Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar
Freund, R. W. and Nachtigal, N. M.. “QMR: a Quasi-Minimal Residual Method for non-Hermitian Linear Systems”. Numerische mathematik 60.1 (1991), pp. 315339.Google Scholar
Fröhlich, C. et al. “Bayesian Quadrature on Riemannian Data Manifolds”. Proceedings of the 38th International Conference on Machine Learning, ICML. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 34593468.Google Scholar
Garnett, R. Bayesian Optimization. Cambridge University Press, 2022.Google Scholar
Garnett, R., Osborne, M. A., and Roberts, S. J.. “Bayesian optimization for sensor set selection”. ACM/IEEE International Conference on Information Processing in Sensor Networks. ACM. 2010, pp. 209219.Google Scholar
Garnett, R. et al. “Bayesian Optimal Active Search and Surveying”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar
Gauss, C. F. Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Perthes, F. and Besser, I.H., 1809.Google Scholar
Gauss, C. F. “Methodus nova integralium valores per approximationem inveniendi”. Proceedings of the Royal Scientific Society of Göttingen. Heinrich Dieterich, 1814.Google Scholar
Gautschi, W. Orthogonal Polynomials—Computation and Approximation. Oxford University Press, 2004.Google Scholar
Genz, A. “Numerical computation of rectangular bivariate and trivariate normal and t probabilities”. Statistics and Computing 14.3 (2004), pp. 251260.Google Scholar
Gerritsma, J., Onnink, R., and Versluis, A.. “Geometry, resistance and stability of the delft systematic yacht hull series”. International shipbuilding progress 28.328 (1981).Google Scholar
Ginsbourger, D., Le Riche, R., and Carraro, L.. “A multi-points criterion for deterministic parallel global optimization based on Gaussian processes” (2008).Google Scholar
Ginsbourger, D., Le Riche, R., and Carraro, L.. “Kriging is well-suited to parallelize optimization”. Computational Intelligence in Expensive Optimization Problems 2 (2010), pp. 131162.Google Scholar
Girolami, M. et al. “The statistical finite element method (stat-FEM) for coherent synthesis of observation data and model predictions”. Computer Methods in Applied Mechanics and Engineering 375 (2021), p. 113533.Google Scholar
Gittins, J. C. “Bandit processes and dynamic allocation indices”. Journal of the Royal Statistical Society. Series B (Methodological) (1979), pp. 148177.Google Scholar
Goldberg, D. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989.Google Scholar
Goldfarb, D. “A family of variable metric updates derived by variational means”. Mathematics of Computation 24.109 (1970), pp. 2326.Google Scholar
Golub, G. and Van Loan, C.. Matrix computations. Johns Hopkins University Press, 1996.Google Scholar
Gómez-Bombarelli, R. et al. “Automatic chemical design using a data-driven continuous representation of molecules”. arXiv:1610.02415 [cs.LG] (2016).Google Scholar
González, J., Osborne, M. A., and Lawrence, N. D.. “GLASSES: Relieving The Myopia Of Bayesian Optimisation”. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 51. JMLR Workshop and Conference Proceedings. JMLR.org, 2016, pp. 790799.Google Scholar
González, J. et al. “Batch Bayesian Optimization via Local Penalization”. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 51. JMLR Workshop and Conference Proceedings. JMLR.org, 2016, pp. 648657.Google Scholar
Goodfellow, I. J., Bengio, Y., and Courville, A.. Deep Learning. MIT Press, 2016.Google Scholar
Gradshteyn, I. and Ryzhik, I.. Table of Integrals, Series, and Products. 7th edition. Academic Press, 2007.Google Scholar
Grcar, J. “Mathematicians of Gaussian elimination”. Notices of the AMS 58.6 (2011), pp. 782792.Google Scholar
Greenbaum, A. Iterative Methods for Solving Linear Systems. Vol. 17. SIAM, 1997.Google Scholar
Greenstadt, J. “Variations on variable-metric methods”. Mathematics of Computation 24 (1970), pp. 122.Google Scholar
Grewal, M. S. and Andrews, A. P.. Kalman Filtering: Theory and Practice Using MATLAB. John Wiley & Sons, Inc., 2001.Google Scholar
Griewank, A. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. SIAM, 2000.Google Scholar
Griewank, A. and Walther, A.. Evaluating Derivatives. Cambridge University Press, 2008.Google Scholar
Gunter, T. et al. “Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature”. Advances in Neural Information Processing Systems, NeurIPS. 2014, pp. 27892797.Google Scholar
Hager, W. “Updating the Inverse of a Matrix”. SIAM Review 31.2 (1989), pp. 221239.Google Scholar
Hairer, E., Lubich, C., and Wanner, G.. Geometric numerical integration: structure-preserving algorithms for ordinary differential equations. Vol. 31. Springer Science & Business Media, 2006.Google Scholar
Hairer, E., Nørsett, S., and Wanner, G.. Solving Ordinary Differential Equations I – Nonstiff Problems. 2nd edition. Vol. 8. Springer Series in Computational Mathematics. Springer, 1993.Google Scholar
Hairer, E. and Wanner, G.. Solving Ordinary Differential Equations II – Stiff and Differential-Algebraic Problems. 2nd edition. Vol. 14. Springer, 1996.Google Scholar
Hartikainen, J. and Särkkä, S.. “Kalman filtering and smoothing solutions to temporal Gaussian process regression models”. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2010. 2010, pp. 379384.Google Scholar
Helmert, F. “Über die Berechnung des wahrscheinlichen Fehlers aus einer endlichen Anzahl wahrer Beobachtungsfehler”. Zeitschrift für Mathematik und Physik 20 (1875), pp. 300303.Google Scholar
Henderson, P. et al. “Deep Reinforcement Learning That Matters”. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 2018, pp. 32073214.Google Scholar
Hennig, P. “Fast Probabilistic Optimization from Noisy Gradients”. Proceedings of the 30th International Conference on Machine Learning, ICML. Vol. 28. JMLR Workshop and Conference Proceedings. JMLR.org, 2013, pp. 6270.Google Scholar
Hennig, P. “Probabilistic Interpretation of Linear Solvers”. SIAM Journal on Optimization (2015), pp. 210233.Google Scholar
Hennig, P. and Hauberg, S.. “Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics”. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 33. JMLR Workshop and Conference Proceedings. JMLR.org, 2014, pp. 347355.Google Scholar
Hennig, P. and Kiefel, M.. “Quasi-Newton methods: A new direction”. International Conference on Machine Learning, ICML. 2012.Google Scholar
Hennig, P., Osborne, M., and Girolami, M.. “Probabilistic numerics and uncertainty in computations”. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 471.2179 (2015).Google Scholar
Hennig, P. and Schuler, C.. “Entropy search for information-efficient global optimization”. Journal of Machine Learning Research 13.Jun (2012), pp. 18091837.Google Scholar
Hernández-Lobato, J. M. et al. “Predictive Entropy Search for Bayesian Optimization with Unknown Constraints”. Proceedings of the 32nd International Conference on Machine Learning, ICML. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org, 2015, pp. 16991707.Google Scholar
Hestenes, M. and Stiefel, E.. “Methods of conjugate gradients for solving linear systems”. Journal of Research of the National Bureau of Standards 49.6 (1952), pp. 409436.Google Scholar
Hinton, G. “A practical guide to training restricted Boltzmann machines”. Neural Networks: Tricks of the Trade. Springer, 2012, pp. 599619.Google Scholar
Hoffman, M., Brochu, E., and de Freitas, N.. “Portfolio Allocation for Bayesian Optimization”. UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2011, pp. 327336.Google Scholar
Hoffman, M. and Ghahramani, Z.. “Output-Space Predictive Entropy Search for Flexible Global Optimization”. NIPS workshop on Bayesian optimization. 2015.Google Scholar
Hoos, H. H. “Programming by optimization”. Communications of the ACM 55.2 (2012), pp. 7080.Google Scholar
Horst, R. and Tuy, H.. Global optimization: Deterministic approaches. Springer Science & Business Media, 2013.Google Scholar
Houlsby, N. et al. “Bayesian Active Learning for Classification and Preference Learning”. arXiv:1112.5745 [stat.ML] (2011).Google Scholar
Hull, T. et al. “Comparing numerical methods for ordinary differential equations”. SIAM Journal on Numerical Analysis 9.4 (1972), pp. 603637.Google Scholar
Huszar, F. and Duvenaud, D.. “Optimally-Weighted Herding is Bayesian Quadrature”. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2012, pp. 377386.Google Scholar
Hutter, F., Hoos, H., and Leyton-Brown, K.. “Sequential Model-Based Optimization for General Algorithm Configuration”. Proceedings of LION-5. 2011, pp. 507523.Google Scholar
Hutter, M. Universal Artificial Intelligence. Texts in Theoretical Computer Science. Springer, 2010.Google Scholar
Ibragimov, I. and Has’minskii, R.. Statistical Estimation: Asymptotic Theory. Springer, New York, 1981.Google Scholar
Ipsen, I. Numerical matrix analysis: Linear systems and least squares. SIAM, 2009.Google Scholar
Islam, R. et al. “Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control”. arXiv:1708.04133 [cs.LG] (2017).Google Scholar
Isserlis, L. “On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables”. Biometrika 12.1/2 (1918), pp. 134139.Google Scholar
Jaynes, E. and Bretthorst, G.. Probability Theory: the Logic of Science. Cambridge University Press, 2003.Google Scholar
Jiang, S. et al. “Efficient Nonmyopic Active Search”. Proceedings of the 34th International Conference on Machine Learning, ICML. Vol. 70. Proceedings of Machine Learning Research. PMLR, 2017, pp. 17141723.Google Scholar
Jiang, S. et al. “Efficient nonmyopic Bayesian optimization and quadrature”. arXiv:1909.04568 [cs.LG] (2019).Google Scholar
Jiang, S. “BINOCULARS for efficient, nonmyopic sequential experimental design”. Proceedings of the 37th International Conference on Machine Learning, ICML. Vol. 119. Proceedings of Machine Learning Research. PMLR, 2020, pp. 47944803.Google Scholar
John, D., Heuveline, V., and Schober, M.. “GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver”. Proceedings of the 36th International Conference on Machine Learning, ICML. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 31523162.Google Scholar
Jones, D. “A taxonomy of global optimization methods based on response surfaces”. Journal of Global Optimization 21.4 (2001), pp. 345383.Google Scholar
Jones, D., Schonlau, M., and Welch, W.. “Efficient global optimization of expensive black-box functions”. Journal of Global Optimization 13.4 (1998), pp. 455492.Google Scholar
Kadane, J. B. and Wasilkowski, G. W.. “Average case epsilon-complexity in computer science: A Bayesian view”. Bayesian Statistics 2, Proceedings of the Second Valencia International Meeting. 1985, pp. 361374.Google Scholar
Kálmán, R. “A New Approach to Linear Filtering and Prediction Problems”. Journal of Fluids Engineering 82.1 (1960), pp. 3545.Google Scholar
Kanagawa, M. and Hennig, H.. “Convergence Guarantees for Adaptive Bayesian Quadrature Methods”. Advances in Neural Information Processing Systems, NeurIPS. 2019, pp. 62346245.Google Scholar
Kanagawa, M., Sriperumbudur, B. K., and Fukumizu, K.. “Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings”. Foundations of Computational Mathematics 20.1 (2020), pp. 155194.Google Scholar
Kanagawa, M. et al. “Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences”. arXiv:1807.02582 [stat.ML] (2018).Google Scholar
Karatzas, I. and Shreve, S. E.. Brownian Motion and Stochastic Calculus. Springer, 1991.Google Scholar
Karvonen, T. and Särkkä, S.. “Classical quadrature rules via Gaussian processes”. IEEE International Workshop on Machine Learning for Signal Processing (MLSP). Vol. 27. 2017.Google Scholar
Kennedy, M. “Bayesian quadrature with non-normal approximating functions”. Statistics and Computing 8.4 (1998), pp. 365375.Google Scholar
Kersting, H. “Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering”. PhD thesis. Eberhard Karls Universität Tübingen, 2020.Google Scholar
Kersting, H. and Hennig, P.. “Active Uncertainty Calibration in Bayesian ODE Solvers”. Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2016.Google Scholar
Kersting, H. and Mahsereci, M.. “A Fourier State Space Model for Bayesian ODE Filters”. Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, ICML. 2020.Google Scholar
Kersting, H., Sullivan, T. J., and Hennig, P.. “Convergence Rates of Gaussian ODE Filters”. Statistics and Computing 30.6 (2020), pp. 17911816.Google Scholar
Kersting, H. et al. “Differentiable Likelihoods for Fast Inversion of ‘Likelihood-Free’ Dynamical Systems”. Proceedings of the 37th International Conference on Machine Learning, ICML. Vol. 119. Proceedings of Machine Learning Research. PMLR, 2020, pp. 51985208.Google Scholar
Kimeldorf, G. S. and Wahba, G.. “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines”. The Annals of Mathematical Statistics 41.2 (1970), pp. 495502.Google Scholar
Kingma, D. P. and Ba, J.. “Adam: A Method for Stochastic Optimization”. 3rd International Conference on Learning Representations, ICLR. 2015.Google Scholar
Kitagawa, G. “Non-Gaussian State—Space Modeling of Nonstationary Time Series”. Journal of the American Statistical Association 82.400 (1987), pp. 10321041.Google Scholar
Klein, A. et al. “Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets”. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 54. Proceedings of Machine Learning Research. PMLR, 2017, pp. 528536.Google Scholar
Ko, C.-W., Lee, J., and Queyranne, M.. “An exact algorithm for maximum entropy sampling”. Operations Research 43.4 (1995), pp. 684691.Google Scholar
Kochenderfer, M. J. Decision Making Under Uncertainty: Theory and Application. The MIT Press, 2015.Google Scholar
Koller, D. and Friedman, N.. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.Google Scholar
Kolmogorov, A. “Zur Theorie der Markoffschen Ketten”. Mathematische Annalen 112 (1 1936), pp. 155160.Google Scholar
Kolmogorov, A. “Three approaches to the quantitative definition of information”. International Journal of Computer Mathematics 2.1-4 (1968), pp. 157168.Google Scholar
Krämer, N. and Hennig, P.. “Stable Implementation of Probabilistic ODE Solvers”. arXiv:2012.10106 [stat.ML] (2020).Google Scholar
Krämer, N. and Hennig, P.. “Linear-Time Probabilistic Solutions of Boundary Value Problems”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar
Krämer, N., Schmidt, J., and Hennig, P.. “Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations”. Artificial Intelligence and Statistics (AISTATS). 2022.Google Scholar
Krämer, N. et al. “Probabilistic ODE Solutions in Millions of Dimensions”. arXiv:2110.11812 [stat.ML] (2021).Google Scholar
Krizhevsky, A. and Hinton, G.. Learning multiple layers of features from tiny images. Tech. rep. 2009.Google Scholar
Kushner, H. J. “A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise”. Journal of Basic Engineering 86.1 (1964), pp. 97106.Google Scholar
Kushner, H. J. “A versatile stochastic model of a function of unknown and time varying form”. Journal of Mathematical Analysis and Applications 5.1 (1962), pp. 150167.Google Scholar
Lai, T. L. and Robbins, H.. “Asymptotically efficient adaptive allocation rules”. Advances in Applied Mathematics 6.1 (1985), pp. 422.Google Scholar
Lanczos, C. “An iteration method for the solution of the eigen-value problem of linear differential and integral operators”. Journal of Research of the National Bureau of Standards 45 (1950), pp. 255282.Google Scholar
Laplace, P. Théorie Analytique des Probabilités. 2nd edition. V. Courcier, Paris, 1814.Google Scholar
Larkin, F. “Gaussian measure in Hilbert space and applications in numerical analysis”. Journal of Mathematics 2.3 (1972).Google Scholar
Lauritzen, S. “Time series analysis in 1880: A discussion of contributions made by TN Thiele”. International Statistical Review/Revue Internationale de Statistique (1981), pp. 319331.Google Scholar
Lauritzen, S. and Spiegelhalter, D.. “Local computations with probabilities on graphical structures and their application to expert systems”. Journal of the Royal Statistical Society. Series B (Methodological) 50 (1988), pp. 157224.Google Scholar
Le Cam, L. “Convergence of estimates under dimensionality restrictions”. Annals of Statistics 1 (1973), pp. 3853.Google Scholar
Lecomte, C. “Exact statistics of systems with uncertainties: An analytical theory of rank-one stochastic dynamic systems”. Journal of Sound and Vibration 332.11 (2013), pp. 27502776.Google Scholar
Lemaréchal, C. “Cauchy and the Gradient Method”. Documenta Mathematica Extra Volume: Optimization Stories (2012), pp. 251254.Google Scholar
Lemieux, C. Monte Carlo and quasi-Monte Carlo sampling. Springer Science & Business Media, 2009.Google Scholar
Lie, H. C., Stahn, M., and Sullivan, T. J.. “Randomised one-step time integration methods for deterministic operator differential equations”. Calcolo 59.1 (2022), p. 13.Google Scholar
Lie, H. C., Stuart, A. M., and Sullivan, T. J.. “Strong convergence rates of probabilistic integrators for ordinary differential equations”. Statistics and Computing 29.6 (2019), pp. 12651283.Google Scholar
Lie, H. C., Sullivan, T. J., and Teckentrup, A. L.. “Random Forward Models and Log-Likelihoods in Bayesian Inverse Problems”. SIAM/ASA Journal on Uncertainty Quantification 6.4 (2018), pp. 16001629.Google Scholar
Lindström, E., Madsen, H., and Nielsen, J. N.. Statistics for Finance. Texts in Statistical Science. Chapman and Hall/CRC, 2015.Google Scholar
Lorenz, E. N. “Deterministic Nonperiodic Flow”. Journal of Atmospheric Sciences 20 (2 1963), pp. 130141.Google Scholar
Loveland, D. “A new interpretation of the von Mises’ concept of random sequence”. Mathematical Logic Quarterly 12.1 (1966), pp. 279294.Google Scholar
Luenberger, D. Introduction to Linear and Nonlinear Programming. 2nd edition. Addison Wesley, 1984.Google Scholar
Lütkepohl, H. Handbook of Matrices. Wiley, 1996.Google Scholar
MacKay, D. “The evidence framework applied to classification networks”. Neural computation 4.5 (1992), pp. 720736.Google Scholar
MacKay, D. “Introduction to Gaussian processes”. NATO ASI Series F Computer and Systems Sciences 168 (1998), pp. 133166.Google Scholar
MacKay, D. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.Google Scholar
MacKay, D. The Humble Gaussian Distribution. Tech. rep. Cavendish Laboratory, Cambridge University, 2006.Google Scholar
Magnani, E. et al. “Bayesian Filtering for ODEs with Bounded Derivatives”. arXiv:1709.08471 [cs.NA] (2017).Google Scholar
Mahsereci, M. “Probabilistic Approaches to Stochastic Optimization”. PhD thesis. Eberhard Karl University of Tübingen, 2018.Google Scholar
Mahsereci, M. and Hennig, P.. “Probabilistic Line Searches for Stochastic Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 181189.Google Scholar
Mahsereci, M. and Hennig, P.. “Probabilistic Line Searches for Stochastic Optimization”. Journal of Machine Learning Research 18.119 (2017), pp. 159.Google Scholar
Mahsereci, M. et al. “Early Stopping without a Validation Set”. arXiv:1703.09580 [cs.LG] (2017).Google Scholar
Mania, H., Guy, A., and Recht, B.. “Simple random search provides a competitive approach to reinforcement learning”. arXiv:1803.07055 [cs.LG] (2018).Google Scholar
Marchant, R. and Ramos, F.. “Bayesian Optimisation for Intelligent Environmental Monitoring”. NIPS workshop on Bayesian Optimization and Decision Making. 2012.Google Scholar
Marchant, R., Ramos, F., and Sanner, S.. “Sequential Bayesian Optimisation for Spatial-Temporal Monitoring”. Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2014, pp. 553562.Google Scholar
Markov, A. “Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga (A generalization of the law of large numbers to variables that depend on each other)”. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete (Proceedings of the Society for Physics and Mathematics at Kazan University) 15.135-156 (1906), p. 18.Google Scholar
Marmin, S., Chevalier, C., and Ginsbourger, D.. “Differentiating the multipoint Expected Improvement for optimal batch design”. arXiv:1503.05509 [stat.ML] (2015).Google Scholar
Marmin, S., Chevalier, C., and Ginsbourger, D.. “Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors”. arXiv:1609.02700 [stat.ML] (2016).Google Scholar
Martinez, R., H. J. “Local and Superlinear Convergence of Structured Secant Methods from the Convex Class”. PhD thesis. Rice University, 1988.Google Scholar
Matérn, B. “Spatial variation”. Meddelanden fran Statens Skogsforskningsinstitut 49.5 (1960).Google Scholar
Matsuda, T. and Miyatake, Y.. “Estimation of Ordinary Differential Equation Models with Discretization Error Quantification”. SIAM/ASA Journal on Uncertainty Quantification 9.1 (2021), pp. 302331.Google Scholar
Matsumoto, M. and Nishimura, T.. “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator”. ACM Transactions on Modeling and Computer Simulation (TOMACS) 8.1 (1998), pp. 330.Google Scholar
McLeod, M., Osborne, M. A., and Roberts, S. J.. “Practical Bayesian Optimization for Variable Cost Objectives”. arXiv:1703.04335 [stat.ML] (2015).Google Scholar
Meister, A. Numerik Linearer Gleichungssysteme. Springer, 2011.Google Scholar
Minka, T. Deriving quadrature rules from Gaussian processes. Tech. rep. Statistics Department, Carnegie Mellon University, 2000.Google Scholar
Mitchell, M. An introduction to genetic algorithms. MIT press, 1998.Google Scholar
Mitchell, T. M. The Need for Biases in Learning Generalizations. Tech. rep. CBM-TR 5-110. Rutgers University, 1980.Google Scholar
Mockus, J., Tiesis, V., and Žilinskas, A.. “The Application of Bayesian Methods for Seeking the Extremum”. Toward Global Optimization. Vol. 2. Elsevier, 1978.Google Scholar
Moore, E. “On the reciprocal of the general algebraic matrix, abstract”. Bulletin of American Mathematical Society 26 (1920), pp. 394395.Google Scholar
Moré, J. J. “Recent developments in algorithms and software for trust region methods”. Mathematical Programming: The state of the art. 1983, pp. 258287.Google Scholar
Neal, R. “Annealed importance sampling”. Statistics and Computing 11.2 (2001), pp. 125139.Google Scholar
Nesterov, Y. “A method of solving a convex programming problem with convergence rate ”. Soviet Mathematics Doklady. Vol. 27. 2. 1983, pp. 372376.Google Scholar
Netzer, Y. et al. “Reading digits in natural images with unsupervised feature learning”. NIPS workshop on deep learning and unsupervised feature learning. 2. 2011, p. 5.Google Scholar
Nickson, T. et al. “Automated Machine Learning on Big Data using Stochastic Algorithm Tuning”. arXiv:1407.7969 [stat.ML] (2014).Google Scholar
Nocedal, J. and Wright, S.. Numerical Optimization. Springer Verlag, 1999.Google Scholar
Nordsieck, A. “On numerical integration of ordinary differential equations”. Mathematics of Computation 16.77 (1962), pp. 2249.Google Scholar
Novak, E. Deterministic and stochastic error bounds in numerical analysis. Vol. 1349. Springer, 2006.Google Scholar
Nyström, E. “Über die praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben”. Acta Mathematica 54.1 (1930), pp. 185204.Google Scholar
O’Hagan, A. “Bayes–Hermite quadrature”. Journal of Statistical Planning and Inference (1991), pp. 245260.Google Scholar
O’Hagan, A. “Some Bayesian Numerical Analysis”. Bayesian Statistics (1992), pp. 345363.Google Scholar
O’Hagan, A. and Kingman, J. F. C.. “Curve Fitting and Optimal Design for Prediction”. Journal of the Royal Statistical Society. Series B 40.1 (1978), pp. 142.Google Scholar
O’Neil, C. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown, 2016.Google Scholar
Oates, C. J. and Sullivan, T. J.. “A Modern Retrospective on Probabilistic Numerics”. Statistics and Computing 29.6 (2019), pp. 13351351.Google Scholar
Oates, C. J., Girolami, M., and Chopin, N.. “Control functionals for Monte Carlo integration”. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79.3 (2017), pp. 695718.Google Scholar
Oates, C. J. et al. “Convergence rates for a class of estimators based on Stein’s method”. Bernoulli 25.2 (2019a), pp. 11411159.Google Scholar
Oates, C. J. et al. “Bayesian Probabilistic Numerical Methods in Time-Dependent State Estimation for Industrial Hydrocyclone Equipment”. Journal of the American Statistical Association 114.528 (2019b), pp. 15181531.Google Scholar
Oesterle, J. et al. “Numerical uncertainty can critically affect simulations of mechanistic models in neuroscience”. bioRxiv (2021).Google Scholar
Oksendal, B. Stochastic Differential Equations: An Introduction with Applications. 6th edition. Springer, 2003.Google Scholar
Ortega, J. and Rheinboldt, W.. Iterative solution of nonlinear equations in several variables. Vol. 30. Classics in Applied Mathematics. SIAM, 1970.Google Scholar
Osborne, M., Garnett, R., and Roberts, S.. “Gaussian processes for global optimization”. 3rd International Conference on Learning and Intelligent Optimization (LION3). 2009.Google Scholar
Osborne, M. A. et al. “Active Learning of Model Evidence Using Bayesian Quadrature”. Advances in Neural Information Processing Systems, NeurIPS. 2012, pp. 4654.Google Scholar
Owhadi, H. “Multigrid with Rough Coefficients and Multiresolution Operator Decomposition from Hierarchical Information Games”. SIAM Review 59.1 (2017), pp. 99149.Google Scholar
Owhadi, H. and Scovel, C.. “Conditioning Gaussian measure on Hilbert space”. arXiv:1506.04208 [math.PR] (2015).Google Scholar
Owhadi, H. and Scovel, C.. “Toward Machine Wald”. Springer Handbook of Uncertainty Quantification. Springer, 2016, pp. 135.Google Scholar
Owhadi, H. and Zhang, L.. “Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients”. Journal of Computational Physics 347 (2017), pp. 99128.Google Scholar
Packel, E. and Traub, J.. “Information-based complexity”. Nature 328.6125 (1987), pp. 2933.Google Scholar
Paleyes, A. et al. “Emulation of physical processes with Emukit”. Second Workshop on Machine Learning and the Physical Sciences, NeurIPS. 2019.Google Scholar
Parlett, B. The Symmetric Eigenvalue Problem. Prentice-Hall, 1980.Google Scholar
Pasquale, F. The black box society: The secret algorithms that control money and information. Harvard University Press, 2015.Google Scholar
Patterson, D. “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink” (2022).Google Scholar
Paul, S., Osborne, M. A., and Whiteson, S.. “Fingerprint Policy Optimisation for Robust Reinforcement Learning”. Proceedings of the 36th International Conference on Machine Learning, ICML. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 50825091.Google Scholar
Pearl, J. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.Google Scholar
Pitman, E. “Sufficient statistics and intrinsic accuracy”. Mathematical Proceedings of the Cambridge Philosophical society. Vol. 32. 04. 1936, pp. 567579.Google Scholar
Poincaré, H. Calcul des Probabilités. Gauthier-Villars, 1896.Google Scholar
Polyak, B. T. “Some methods of speeding up the convergence of iteration methods”. USSR Computational Mathematics and Mathematical Physics 4.5 (1964), pp. 117.Google Scholar
Powell, M. J. D. “A new algorithm for unconstrained optimization”. Nonlinear Programming. AP, 1970.Google Scholar
Powell, M. J. D. “Convergence properties of a class of minimization algorithms”. Nonlinear programming 2. 1975, pp. 127.Google Scholar
Press, W. et al. Numerical Recipes in Fortran 77: The Art of Scientific Computing. Cambridge University Press, 1992.Google Scholar
Quiñonero-Candela, J. and Rasmussen, C.. “A unifying view of sparse approximate Gaussian process regression”. Journal of Machine Learning Research 6 (2005), pp. 19391959.Google Scholar
Rackauckas, C. et al. “A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions”. arXiv:1812.01892 [math.NA] (2018).Google Scholar
Rahimi, A. and Recht, B.. “Random Features for Large-Scale Kernel Machines”. Advances in Neural Information Processing Systems, NeurIPS. Curran Associates, Inc., 2007, pp. 11771184.Google Scholar
Raissi, M., Perdikaris, P., and Karniadakis, G. E.. “Machine learning of linear differential equations using Gaussian processes”. Journal of Computational Physics 348 (2017), pp. 683693.Google Scholar
Rasmussen, C. and Williams, C.. Gaussian Processes for Machine Learning. MIT, 2006.Google Scholar
Rauch, H., Striebel, C., and Tung, F.. “Maximum likelihood estimates of linear dynamic systems”. Journal of the American Institute of Aeronautics and Astronautics (AAIA) 3.8 (1965), pp. 14451450.Google Scholar
Reid, W. Riccati differential equations. Elsevier, 1972.Google Scholar
Riccati, J. “Animadversiones in aequationes differentiales secundi gradus”. Actorum Eruditorum Supplementa 8 (1724), pp. 6673.Google Scholar
Ritter, K. Average-case analysis of numerical problems. Lecture Notes in Mathematics 1733. Springer, 2000.Google Scholar
Robert, C. and Casella, G.. Monte Carlo Statistical Methods. Springer Science & Business Media, 2013.Google Scholar
Rontsis, N., Osborne, M. A., and Goulart, P. J.. “Distributionally Robust Optimization Techniques in Batch Bayesian Optimization”. Journal of Machine Learning Research 21.149 (2020), pp. 126.Google Scholar
Saad, Y. Iterative Methods for Sparse Linear Systems. SIAM, 2003.Google Scholar
Saad, Y. and Schultz, M.. “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems”. SIAM Journal on scientific and statistical computing 7.3 (1986), pp. 856869.Google Scholar
Sacks, J. and Ylvisaker, D.. “Statistical designs and integral approximation”. Proc. 12th Bienn. Semin. Can. Math. Congr. 1970, pp. 115136.Google Scholar
Sacks, J. and Ylvisaker, D.. “Model robust design in regression: Bayes theory”. Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. Vol. 2. 1985, pp. 667679.Google Scholar
Sard, A. Linear approximation. American Mathematical Society, 1963.Google Scholar
Särkkä, S. Bayesian filtering and smoothing. Cambridge University Press, 2013.Google Scholar
Särkkä, S. and Solin, A.. Applied Stochastic Differential Equations. Cambridge University Press, 2019.Google Scholar
Särkkä, S., Solin, A., and Hartikainen, J.. “Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: A look at Gaussian process regression through Kalman filtering”. IEEE Signal Processing Magazine 30.4 (2013), pp. 5161.Google Scholar
Schmidt, J., Krämer, N., and Hennig, P.. “A Probabilistic State Space Model for Joint Inference from Differential Equations and Data”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar
Schmidt, R. M., Schneider, F., and Hennig, P.. “Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers”. Proceedings of the 38th International Conference on Machine Learning. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 93679376.Google Scholar
Schneider, F., Dangel, F., and Hennig, P.. “Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar
Schober, M., Duvenaud, D., and Hennig, P.. “Probabilistic ODE Solvers with Runge-Kutta Means”. Advances in Neural Information Processing Systems, NeurIPS. 2014, pp. 739747.Google Scholar
Schober, M., Särkkä, S., and Hennig, P.. “A probabilistic model for the numerical solution of initial value problems”. Statistics and Computing 29.1 (2019), pp. 99122.Google Scholar
Schober, M. et al. “Probabilistic shortest path tractography in DTI using Gaussian Process ODE solvers”. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014. Springer, 2014.Google Scholar
Schölkopf, B. “The Kernel Trick for Distances”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2000, pp. 301307.Google Scholar
Schölkopf, B. and Smola, A.. Learning with Kernels. MIT Press, 2002.Google Scholar
Schur, I. “Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind.” Journal für die reine und angewandte Mathematik 147 (1917), pp. 205232.Google Scholar
Schwartz, R. et al. “Green AI”. arXiv:1907.10597 [cs.CY] (2019).Google Scholar
Scieur, D. et al. “Integration Methods and Optimization Algorithms”. Advances in Neural Information Processing Systems, NeurIPS. 2017, pp. 11091118.Google Scholar
Shah, A. and Ghahramani, Z.. “Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 33303338.Google Scholar
Shahriari, B. et al. “An Entropy Search Portfolio for Bayesian Optimization”. arXiv:1406.4625 [stat.ML] (2014).Google Scholar
Shahriari, B. et al. “Taking the human out of the loop: A review of Bayesian optimization”. Proceedings of the IEEE 104.1 (2016), pp. 148175.Google Scholar
Shanno, D. “Conditioning of quasi-Newton methods for function minimization”. Mathematics of Computation 24.111 (1970), pp. 647656.Google Scholar
Skeel, R. “Equivalent Forms of Multistep Formulas”. Mathematics of Computation 33 (1979).Google Scholar
Skilling, J. “Bayesian solution of ordinary differential equations”. Maximum Entropy and Bayesian Methods (1991).Google Scholar
Smola, A. et al. “A Hilbert space embedding for distributions”. International Conference on Algorithmic Learning Theory. 2007, pp. 1331.Google Scholar
Snelson, E. and Ghahramani, Z.. “Sparse Gaussian Processes using Pseudo-inputs”. Advances in Neural Information Processing Systems, NeurIPS. 2005, pp. 12571264.Google Scholar
Snoek, J., Larochelle, H., and Adams, R. P.. “Practical Bayesian Optimization of Machine Learning Algorithms”. Advances in Neural Information Processing Systems, NeurIPS. 2012, pp. 29602968.Google Scholar
Snoek, J. et al. “Scalable Bayesian Optimization Using Deep Neural Networks”. Proceedings of the 32nd International Conference on Machine Learning, ICML. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org, 2015, pp. 21712180.Google Scholar
Solak, E. et al. “Derivative Observations in Gaussian Process Models of Dynamic Systems”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2002, pp. 10331040.Google Scholar
Solin, A. and Särkkä, S.. “Explicit Link Between Periodic Covariance Functions and State Space Models”. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 33. JMLR Workshop and Conference Proceedings. JMLR.org, 2014, pp. 904912.Google Scholar
Sonneveld, P. “CGS, a fast Lanczos-type solver for nonsymmetric linear systems”. SIAM Journal on Scientific and Statistical Computing 10.1 (1989), pp. 3652.Google Scholar
Spitzbart, A. “A generalization of Hermite’s Interpolation Formula”. The American Mathematical Monthly 67.1 (1960), pp. 4246.Google Scholar
Springenberg, J. T. et al. “Bayesian Optimization with Robust Bayesian Neural Networks”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 41344142.Google Scholar
Srinivas, N. et al. “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design”. Proceedings of the 27th International Conference on Machine Learning, ICML. Omnipress, 2010, pp. 10151022.Google Scholar
Stein, M. Interpolation of spatial data: some theory for Kriging. Springer Verlag, 1999.Google Scholar
Steinwart, I. “Convergence Types and Rates in Generic Karhunen-Loève Expansions with Applications to Sample Path Properties”. Potential Analysis 51.3 (2019), pp. 361395.Google Scholar
Steinwart, I. and Christmann, A.. Support Vector Machines. Springer Science & Business Media, 2008.Google Scholar
Streltsov, S. and Vakili, P.. “A Non-myopic Utility Function for Statistical Global Optimization Algorithms”. Journal of Global Optimization 14.3 (1999), pp. 283298.Google Scholar
Student. “The probable error of a mean”. Biometrika 6 (1 1908), pp. 125.Google Scholar
Sul’din, A. V. “Wiener measure and its applications to approximation methods. I”. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika 6 (1959), pp. 145158.Google Scholar
Sul’din, A. V. “Wiener measure and its applications to approximation methods. II”. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika 5 (1960), pp. 165179.Google Scholar
Sullivan, T. Introduction to uncertainty quantification. Vol. 63. Texts in Applied Mathematics. Springer, 2015.Google Scholar
Sutton, R. and Barto, A.. Reinforcement Learning. MIT Press, 1998.Google Scholar
Swersky, K., Snoek, J., and Adams, R. P.. “Freeze-Thaw Bayesian Optimization”. arXiv:1406.3896 [stat.ML] (2014).Google Scholar
Swersky, K. et al. “Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces”. NIPS workshop on Bayesian Optimization in theory and practice (BayesOpt’13). 2013.Google Scholar
Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 2005.Google Scholar
Teschl, G. Ordinary Differential Equations and Dynamical Systems. Vol. 140. Graduate Studies in Mathematics. American Mathematical Society, 2012.Google Scholar
Teymur, O., Zygalakis, K., and Calderhead, B.. “Probabilistic Linear Multistep Methods”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 43144321.Google Scholar
Teymur, O. et al. “Implicit Probabilistic Integrators for ODEs”. Advances in Neural Information Processing Systems, NeurIPS. 2018, pp. 72557264.Google Scholar
Thaler, R. H. “Anomalies: The Winner’s Curse”. Journal of Economic Perspectives 2.1 (1988), pp. 191202.Google Scholar
Thiele, T. “Om Anvendelse af mindste Kvadraters Methode i nogle Tilfælde, hvor en Komplikation af visse Slags uensartede tilfældige Fejlkilder giver Fejlene en ‘systematisk’ Karakter”. Det Kongelige Danske Videnskabernes Selskabs Skrifter-Naturvidenskabelig og Mathematisk Afdeling (1880), pp. 381408.Google Scholar
Traub, J., Wasilkowski, G., and Woźniakowski, H.. Information, Uncertainty, Complexity. Addison-Wesley Publishing Company, 1983.Google Scholar
Trefethen, L. and Bau, D. III. Numerical Linear Algebra. SIAM, 1997.Google Scholar
Tronarp, F., Bosch, N., and Hennig, P.. “Fenrir: Physics-Enhanced Regression for Initial Value Problems”. arXiv:2202.01287 [cs.LG] (2022).Google Scholar
Tronarp, F., Särkkä, S., and Hennig, P.. “Bayesian ODE solvers: The maximum a posteriori estimate”. Statistics and Computing 31.3 (2021), pp. 118.Google Scholar
Tronarp, F. et al. “Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective”. Statistics and Computing 29.6 (2019), pp. 12971315.Google Scholar
Turing, A. “Rounding-off errors in matrix processes”. Quarterly Journal of Mechanics and Applied Mathematics 1.1 (1948), pp. 287308.Google Scholar
Uhlenbeck, G. and Ornstein, L.. “On the theory of the Brownian motion”. Physical Review 36.5 (1930), p. 823.Google Scholar
van Loan, C. “The ubiquitous Kronecker product”. Journal of Computational and Applied Mathematics 123 (2000), pp. 85100.Google Scholar
Vijayakumar, S. and Schaal, S.. “Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space”. Proceedings of the Seventeenth International Conference on Machine Learning, ICML. Morgan Kaufmann, 2000, pp. 10791086.Google Scholar
Villemonteix, J., Vazquez, E., and Walter, E.. “An informational approach to the global optimization of expensive-to-evaluate functions”. Journal of Global Optimization 44.4 (2009), pp. 509534.Google Scholar
Von Neumann, J. “Various techniques used in connection with random digits”. Monte Carlo Method. Vol. 12. National Bureau of Standards Applied Mathematics Series. 1951, pp. 3638.Google Scholar
Wahba, G. Spline models for observational data. CBMS-NSF Regional Conferences series in applied mathematics. SIAM, 1990.Google Scholar
Wang, J., Cockayne, J., and Oates, C.. “On the Bayesian Solution of Differential Equations”. Proceedings of the 38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (2018).Google Scholar
Wang, J. et al. “Parallel Bayesian Global Optimization of Expensive Functions”. arXiv:1602.05149 [stat.ML] (2016).Google Scholar
Wang, J. et al. “Bayesian numerical methods for nonlinear partial differential equations”. Statistics and Computing 31.55 (2021).Google Scholar
Wang, Z. and Jegelka, S.. “Max-value Entropy Search for Efficient Bayesian Optimization”. Proceedings of the 34th International Conference on Machine Learning, ICML. Vol. 70. Proceedings of Machine Learning Research. PMLR, 2017, pp. 36273635.Google Scholar
Warth, W. and Werner, J.. “Effiziente Schrittweitenfunktionen bei unrestringierten Optimierungsaufgaben”. Computing 19.1 (1977), pp. 5972.Google Scholar
Weise, T. “Global optimization algorithms-theory and application”. Self-Published, (2009), pp. 2526.Google Scholar
Wendland, H. and Rieger, C.. “Approximate Interpolation with Applications to Selecting Smoothing Parameters”. Numerische Mathematik 101.4 (2005), pp. 729748.Google Scholar
Wenger, J. and Hennig, P.. “Probabilistic Linear Solvers for Machine Learning”. Advances in Neural Information Processing Systems, NeurIPS (2020).Google Scholar
Wenger, J. et al. “ProbNum: Probabilistic Numerics in Python”. 2021.Google Scholar
Werner, J. “Über die globale Konvergenz von Variable-Metrik-Verfahren mit nicht-exakter Schrittweitenbestimmung”. Numerische Mathematik 31.3 (1978), pp. 321334.Google Scholar
Wiener, N. “Differential space”. Journal of Mathematical Physics 2 (1923), pp. 131174.Google Scholar
Wilkinson, J. The Algebraic Eigenvalue Problem. Oxford University Press, 1965.Google Scholar
Williams, C. K. I. and Seeger, M. W.. “Using the Nyström Method to Speed Up Kernel Machines”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2000, pp. 682688.Google Scholar
Wills, A. G. and Schön, T. B.. “On the construction of probabilistic Newton-type algorithms”. IEEE Conference on Decision and Control (CDC). Vol. 56. 2017.Google Scholar
Winfield, D. H. “Function and functional optimization by interpolation in data tables”. PhD thesis. Harvard University, 1970.Google Scholar
Wishart, J. “The generalised product moment distribution in samples from a normal multivariate population”. Biometrika (1928), pp. 3252.Google Scholar
Wolfe, P. “Convergence conditions for ascent methods”. SIAM review (1969), pp. 226235.Google Scholar
Wu, J. and Frazier, P. I.. “The Parallel Knowledge Gradient Method for Batch Bayesian Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 31263134.Google Scholar
Wu, J. et al. “Bayesian Optimization with Gradients”. Advances in Neural Information Processing Systems, NeurIPS. 2017, pp. 52675278.Google Scholar
Zeiler, M. D. “ADADELTA: An Adaptive Learning Rate Method”. arXiv:1212.5701 [cs.LG] (2012).Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×