References

Philipp Hennig; Michael A. Osborne; Hans P. Kersting

doi:10.1017/9781316681411.054

References

Published online by Cambridge University Press: 01 June 2022

Philipp Hennig ,

Michael A. Osborne and

Hans P. Kersting

Show author details

Philipp Hennig: Affiliation:
Eberhard-Karls-Universität Tübingen, Germany
Michael A. Osborne: Affiliation:
University of Oxford
Hans P. Kersting: Affiliation:
Ecole Normale Supérieure, Paris

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Probabilistic Numerics
Computation as Machine Learning
, pp. 369 - 394

DOI: https://doi.org/10.1017/9781316681411.054 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdulle, A. and Garegnani, G.. “Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration”. Statistics and Computing 30.4 (2020), pp. 907–932.Google Scholar

Abdulle, A. and Garegnani, G.. “A probabilistic finite element method based on random meshes: A posteriori error estimators and Bayesian inverse problems”. Computer Methods in Applied Mechanics and Engineering 384 (2021), p. 113961.Google Scholar

Adler, R. The Geometry of Random Fields. Wiley, 1981.Google Scholar

Adler, R An introduction to continuity, extrema, and related topics for general Gaussian processes. Vol. 12. Lecture Notes-Monograph Series. Institute of Mathematical Statistics, 1990.Google Scholar

Ajne, B. and Dalenius, T.. “Några Tillämpningar av statistika idéer på numerisk integration”. Nordisk Mathematisk Tidskrift 8.4 (1960), pp. 145–152.Google Scholar

Akhiezer, N. I. and Glazman, I. M.. Theory of linear operators in Hilbert space. Vol. I& II. Courier Corporation, 2013.Google Scholar

Alizadeh, F., Haeberley, J.-P. A., and Overton, M. L.. “Primal-Dual interior-point methods for semidefinite programming: Convergence rates, stability and numerical results”. SIAM Journal on Optimization (1988), pp. 746–768.Google Scholar

Anderson, B. and Moore, J.. Optimal Filtering. Prentice-Hall, 1979.Google Scholar

Anderson, E. et al. LAPACK Users’ Guide. 3rd edition. Society for Industrial and Applied Mathematics (SIAM), 1999.Google Scholar

Arcangéli, R., López de Silanes, M. C., and Torrens, J. J.. “An extension of a bound for functions in Sobolev spaces, with applications to (m, s)-spline interpolation and smoothing”. Numerische Mathematik 107.2 (2007), pp. 181–211.Google Scholar

Armijo, L. “Minimization of functions having Lipschitz continuous first partial derivatives”. Pacific Journal of Mathematics (1966), pp. 1–3.Google Scholar

Arnold, V. I. Ordinary Differential Equations. Universitext. Springer, 1992.Google Scholar

Arnoldi, W. “The principle of minimized iterations in the solution of the matrix eigenvalue problem”. Quarterly of Applied Mathematics 9.1 (1951), pp. 17–29.Google Scholar

Aronszajn, N. “Theory of reproducing kernels”. Transactions of the AMS (1950), pp. 337–404.Google Scholar

Arvanitidis, G. et al. “Fast and Robust Shortest Paths on Manifolds Learned from Data”. The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 89. Proceedings of Machine Learning Research. PMLR, 2019, pp. 1506–1515.Google Scholar

Azimi, J., Fern, A., and Fern, X. Z.. “Batch Bayesian Optimization via Simulation Matching”. Advances in Neural Information Processing Systems, NeurIPS. Curran Associates, Inc., 2010, pp. 109–117.Google Scholar

Azimi, J., Jalali, A., and Fern, X. Z.. “Hybrid Batch Bayesian Optimization”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar

Bach, F. “On the equivalence between kernel quadrature rules and random feature expansions”. Journal of Machine Learning Research (JMLR) 18.21 (2017), pp. 1–38.Google Scholar

Bach, F., Lacoste-Julien, S., and Obozinski, G.. “On the Equivalence between Herding and Conditional Gradient Algorithms”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar

Baker, C. The numerical treatment of integral equations. Oxford: Clarendon Press, 1973.Google Scholar

Balles, L. and Hennig, P.. “Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients”. Proceedings of the 35th International Conference on Machine Learning, ICML. Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 413–422.Google Scholar

Balles, L., Romero, J., and Hennig, P.. “Coupling Adaptive Batch Sizes with Learning Rates”. Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2017.Google Scholar

Bapat, R. Nonnegative Matrices and Applications. Cambridge University Press, 1997.Google Scholar

Barber, D. Bayesian reasoning and machine learning. Cambridge University Press, 2012.Google Scholar

Bardenet, R. and Hardy, A.. “Monte Carlo with Determinantal Point Processes”. Annals of Applied Probability (2019).Google Scholar

Bartels, S. et al. “Probabilistic linear solvers: a unifying view”. Statistics and Computing 29.6 (2019), pp. 1249–1263.Google Scholar

Belhadji, A., Bardenet, R., and Chainais, P.. “Kernel quadrature with DPPs”. Advances in Neural Information Processing Systems, NeurIPS. 2019, pp. 12907–12917.Google Scholar

Bell, B. M. “The Iterated Kalman Smoother as a Gauss—Newton Method”. SIAM Journal on Optimization 4.3 (1994), pp. 626–636.Google Scholar

Bell, B. M. and Cathey, F. W.. “The iterated Kalman filter update as a Gauss– Newton method”. IEEE Transaction on Automatic Control 38.2 (1993), pp. 294–297.Google Scholar

Benoit, . “Note sûre une méthode de résolution des équations normales provenant de l’application de la méthode des moindres carrés a un système d’équations linéaires en nombre inférieure a celui des inconnues. Application de la méthode a la résolution d’un système défini d’équations linéaires. (Procédé du Commandant Cholesky)”. Bulletin Geodesique (1924), pp. 67–77.Google Scholar

Berg, C., Christensen, J., and Ressel, P.. Harmonic Analysis on Semigroups — Theory of Positive Definite and Related Functions. Springer, 1984.Google Scholar

Bergstra, J. et al. “Algorithms for Hyper-Parameter Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2011, pp. 2546–2554.Google Scholar

Bertsekas, D. Nonlinear programming. Athena Scientific, 1999.Google Scholar

Bettencourt, J., Johnson, M., and Duvenaud, D.. “Taylor-mode automatic differentiation for higher-order derivatives”. NeurIPS 2019 Workshop Program Transformations. 2019.Google Scholar

Bini, D., Iannazzo, B., and Meini, B.. Numerical solution of algebraic Riccati equations. SIAM, 2011.Google Scholar

Bishop, C. Pattern Recognition and Machine Learning. Springer, 2006.Google Scholar

Björck, Å. Numerical Methods in Matrix Computations. Springer, 2015.Google Scholar

Blight, B. J. N. and Ott, L.. “A Bayesian Approach to Model Inadequacy for Polynomial Regression”. Biometrika 62.1 (1975), pp. 79–88.Google Scholar

Borodin, A. N. and Salminen, P.. Handbook of Brownian Motion - Facts and Formulae. 2nd edition. Probability and Its Applications. Birkhäuser Basel, 2002.Google Scholar

Bosch, N., Hennig, P., and Tronarp, F.. “Calibrated adaptive probabilistic ODE solvers”. Artificial Intelligence and Statistics (AISTATS). 2021, pp. 3466–3474.Google Scholar

Bosch, N., Tronarp, F., and Hennig, P.. “Pick-and-Mix Information Operators for Probabilistic ODE Solvers”. Artificial Intelligence and Statistics (AISTATS). 2022.Google Scholar

Bottou, L., Curtis, F. E., and Nocedal, J.. “Optimization Methods for Large-Scale Machine Learning”. arXiv:1606.04838 [stat.ML] (2016).Google Scholar

Bougerol, P. “Kalman filtering with random coefficients and contractions”. SIAM Journal on Control and Optimization 31.4 (1993), pp. 942–959.Google Scholar

Boyd, S. and Vandenberghe, L.. Convex Optimization. Cambridge University Press, 2004.Google Scholar

Bretthorst, G. Bayesian Spectrum Analysis and Parameter Estimation. Vol. 48. Lecture Notes in Statistics. Springer, 1988.Google Scholar

Briol, F. et al. “Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 1162–1170.Google Scholar

Briol, F.-X. et al. “Probabilistic Integration: A Role in Statistical Computation?” Statistical Science 34.1 (2019), pp. 1–22.Google Scholar

Broyden, C. “A new double-rank minimization algorithm”. Notices of the AMS 16 (1969), p. 670.Google Scholar

Butcher, J. C. Numerical Methods for Ordinary Differential Equations. 3rd edition. John Wiley & Sons, 2016.Google Scholar

Calandra, R. et al. “Bayesian Gait Optimization for Bipedal Locomotion”. Learning and Intelligent OptimizatioN (LION8). 2014a, pp. 274–290.Google Scholar

Calandra, R. et al. “An experimental comparison of Bayesian optimization for bipedal locomotion”. Proceedings of the International Conference on Robotics and Automation (ICRA). 2014b.Google Scholar

Cashore, J. M., Kumarga, L., and Frazier, P. I.. “Multi-Step Bayesian Optimization for One-Dimensional Feasibility Determination” (2015).Google Scholar

Chai, H. R. and Garnett, R.. “Improving Quadrature for Constrained Integrands”. The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 89. Proceedings of Machine Learning Research. PMLR, 2019, pp. 2751–2759.Google Scholar

Chen, R. T. Q. et al. “Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering”. Proceedings on “I Can’t Believe It’s Not Better!” at NeurIPS Workshops. Vol. 137. Proceedings of Machine Learning Research. PMLR, 2020, pp. 60–69.Google Scholar

Chen, Y., Welling, M., and Smola, A. J.. “Super-Samples from Kernel Herding”. Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2010, pp. 109–116.Google Scholar

Chen, Y. et al. “Bayesian optimization in AlphaGo”. arXiv:1812.06855 [cs.LG] (2018).Google Scholar

Chevalier, C. and Ginsbourger, D.. “Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection”. Learning and Intelligent Optimization. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, pp. 59–69.Google Scholar

Chkrebtii, O. A. and Campbell, D.. “Adaptive step-size selection for state-space probabilistic differential equation solvers”. Statistics and Computing 29 (2019), pp. 1285–1295.Google Scholar

Chkrebtii, O. A. et al. “Bayesian solution uncertainty quantification for differential equations”. Bayesian Anal. 11.4 (2016), pp. 1239–1267.Google Scholar

Church, A. “On the concept of a random sequence”. Bulletin of the AMS 46.2 (1940), pp. 130–135.Google Scholar

Cockayne, J. et al. “Probabilistic Numerical Methods for Partial Differential Equations and Bayesian Inverse Problems”. arXiv:1605.07811v3 [stat.ME] (2017a).Google Scholar

Cockayne, J. et al. “A Bayesian conjugate gradient method (with discussion)”. Bayesian Analysis 14.3 (2019a), pp. 937–1012.Google Scholar

Cockayne, J. et al. “Bayesian Probabilistic Numerical Methods”. SIAM Review 61.4 (2019b), pp. 756–789.Google Scholar

Cockayne, J. et al. “Probabilistic numerical methods for PDE-constrained Bayesian inverse problems”. AIP Conference Proceedings 1853.1 (2017b), p. 060001.Google Scholar

Colas, C., Sigaud, O., and Oudeyer, P.-Y.. “How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments”. arXiv:1806.08295 [cs.LG] (2018).Google Scholar

Conrad, P. R. et al. “Statistical analysis of differential equations: introducing probability measures on numerical solutions”. Statistics and Computing 27.4 (2017), pp. 1065–1082.Google Scholar

Cottle, R. “Manifestations of the Schur complement”. Linear Algebra Applications 8 (1974), pp. 189–211.Google Scholar

Cox, R. “Probability, frequency and reasonable expectation”. American Journal of Physics 14.1 (1946), pp. 1–13.Google Scholar

Cranmer, K., Brehmer, J., and Louppe, G.. “The frontier of simulation-based inference”. Proceedings of the National Academy of Sciences (2020).Google Scholar

Cunningham, J., Hennig, P., and Lacoste-Julien, S.. “Gaussian Probabilities and Expectation Propagation”. arXiv:1111.6832 [stat.ML] (2011).Google Scholar

Dahlquist, G. G. “A special stability problem for linear multistep methods”. BIT Numerical Mathematics 3 (1963), pp. 27–43.Google Scholar

Dangel, F., Kunstner, F., and Hennig, P.. “BackPACK: Packing more into Backprop”. 8th International Conference on Learning Representations, ICLR. 2020.Google Scholar

Dashti, M. and Stuart, A. M.. “The Bayesian Approach to Inverse Problems”. Handbook of Uncertainty Quantification. Springer International Publishing, 2017, pp. 311–428.Google Scholar

Davidon, W. Variable metric method for minimization. Tech. rep. Argonne National Laboratories, Ill., 1959.Google Scholar

Davis, P. “Leonhard Euler’s Integral: A Historical Profile of the Gamma Function.” American Mathematical Monthly 66.10 (1959), pp. 849–869.Google Scholar

Davis, P. and Rabinowitz, P.. Methods of Numerical Integration. 2nd edition. Academic Press, 1984.Google Scholar

Dawid, A. “Some matrix-variate distribution theory: Notational considerations and a Bayesian application”. Biometrika 68.1 (1981), pp. 265–274.Google Scholar

Daxberger, E. and Low, B.. “Distributed Batch Gaussian Process Optimization”. PMLR. 2017, pp. 951–960.Google Scholar

Daxberger, E. et al. “Laplace Redux-Effortless Bayesian Deep Learning”. Advances in Neural Information Processing Systems, NeurIPS. Vol. 34. 2021.Google Scholar

Demmel, J. W. Applied Numerical Linear Algebra. SIAM, 1997.Google Scholar

Dennis, J. “On some methods based on Broyden’s secant approximations”. Numerical Methods for Non-Linear Optimization. 1971.Google Scholar

Dennis, J. E. and J. J., Moré. “Quasi-Newton methods, motivation and theory”. SIAM Review 19.1 (1977), pp. 46–89.Google Scholar

Desautels, T., Krause, A., and Burdick, J. W.. “Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar

Deuflhard, P. and Bornemann, F.. Scientific Computing with Ordinary Differential Equations. Vol. 42. Springer Texts in Applied Mathematics. Springer, 2002.Google Scholar

Diaconis, P. “Bayesian numerical analysis”. Statistical decision theory and related topics IV (1988), pp. 163–175.Google Scholar

Diaconis, P. and Freedman, D.. “Finite exchangeable sequences”. The Annals of Probability (1980), pp. 745–764.Google Scholar

Diaconis, P. and Ylvisaker, D.. “Conjugate priors for exponential families”. The Annals of Statistics 7.2 (1979), pp. 269–281.Google Scholar

Dick, J., Kuo, F. Y., and Sloan, I. H.. “High-dimensional integration: the quasi-Monte Carlo way”. Acta Numerica 22 (2013), pp. 133–288.Google Scholar

Dixon, L. “Quasi-Newton algorithms generate identical points”. Mathematical Programming 2.1 (1972a), pp. 383–387.Google Scholar

Dixon, L. “Quasi Newton techniques generate identical points II: The proofs of four new theorems”. Mathematical Programming 3.1 (1972b), pp. 345–358.Google Scholar

Dormand, J. and Prince, P.. “A family of embedded Runge-Kutta formulae”. Journal of computational and applied mathematics (1980), pp. 19–26.Google Scholar

Doucet, A., de Freitas, N., and Gordon, N.. “An Introduction to Sequential Monte Carlo Methods”. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer, New York, NY, 2001, pp. 3–14.Google Scholar

Driscoll, M. “The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process”. Probability Theory and Related Fields 26.4 (1973), pp. 309–316.Google Scholar

Eich-Soellner, E. and Führer, C.. “Implicit Ordinary Differential Equations”. Numerical Methods in Multibody Dynamics. Vieweg+Teubner Verlag, 1998, pp. 139–192.Google Scholar

Einstein, A. “Zur Theorie der Brownschen Bewegung”. Annalen der Physik (1906), pp. 371–381.Google Scholar

Faßbender, H. Symplectic methods for the symplectic eigenproblem. Springer Science & Business Media, 2007.Google Scholar

Fillion, N. and Corless, R. M.. “On the epistemological analysis of modeling and computational error in the mathematical sciences”. Synthese 191.7 (2014), pp. 1451–1467.Google Scholar

Fletcher, R. “A new approach to variable metric algorithms”. The Computer Journal 13.3 (1970), p. 317.Google Scholar

Fletcher, R. “Conjugate Gradients Methods for Indefinite Systems”. Dundee Biennial Conference on Numerical Analysis. 1975, pp. 73–89.Google Scholar

Fletcher, R. and Reeves, C.. “Function minimization by conjugate gradients”. The Computer Journal (1964), pp. 149–154.Google Scholar

Fowler, D. and Robson, E.. “Square root approximations in Old Babylonian mathematics: YBC 7289 in context”. Historia Mathematica 25.4 (1998), pp. 366–378.Google Scholar

Frazier, P., Powell, W., and Dayanik, S.. “The Knowledge-Gradient Policy for Correlated Normal Beliefs”. INFORMS Journal on Computing 21.4 (2009), pp. 599–613.Google Scholar

Fredholm, E. I. “Sur une classe d’équations fonctionnelles”. Acta Mathematica 27 (1903), pp. 365–390.Google Scholar

Freitas, N. de, A. J., Smola, and Zoghi, M.. “Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar

Freund, R. W. and Nachtigal, N. M.. “QMR: a Quasi-Minimal Residual Method for non-Hermitian Linear Systems”. Numerische mathematik 60.1 (1991), pp. 315–339.Google Scholar

Fröhlich, C. et al. “Bayesian Quadrature on Riemannian Data Manifolds”. Proceedings of the 38th International Conference on Machine Learning, ICML. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 3459–3468.Google Scholar

Garnett, R. Bayesian Optimization. Cambridge University Press, 2022.Google Scholar

Garnett, R., Osborne, M. A., and Roberts, S. J.. “Bayesian optimization for sensor set selection”. ACM/IEEE International Conference on Information Processing in Sensor Networks. ACM. 2010, pp. 209–219.Google Scholar

Garnett, R. et al. “Bayesian Optimal Active Search and Surveying”. Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.Google Scholar

Gauss, C. F. Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Perthes, F. and Besser, I.H., 1809.Google Scholar

Gauss, C. F. “Methodus nova integralium valores per approximationem inveniendi”. Proceedings of the Royal Scientific Society of Göttingen. Heinrich Dieterich, 1814.Google Scholar

Gautschi, W. Orthogonal Polynomials—Computation and Approximation. Oxford University Press, 2004.Google Scholar

Genz, A. “Numerical computation of rectangular bivariate and trivariate normal and t probabilities”. Statistics and Computing 14.3 (2004), pp. 251–260.Google Scholar

Gerritsma, J., Onnink, R., and Versluis, A.. “Geometry, resistance and stability of the delft systematic yacht hull series”. International shipbuilding progress 28.328 (1981).Google Scholar

Ginsbourger, D., Le Riche, R., and Carraro, L.. “A multi-points criterion for deterministic parallel global optimization based on Gaussian processes” (2008).Google Scholar

Ginsbourger, D., Le Riche, R., and Carraro, L.. “Kriging is well-suited to parallelize optimization”. Computational Intelligence in Expensive Optimization Problems 2 (2010), pp. 131–162.Google Scholar

Girolami, M. et al. “The statistical finite element method (stat-FEM) for coherent synthesis of observation data and model predictions”. Computer Methods in Applied Mechanics and Engineering 375 (2021), p. 113533.Google Scholar

Gittins, J. C. “Bandit processes and dynamic allocation indices”. Journal of the Royal Statistical Society. Series B (Methodological) (1979), pp. 148–177.Google Scholar

Goldberg, D. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989.Google Scholar

Goldfarb, D. “A family of variable metric updates derived by variational means”. Mathematics of Computation 24.109 (1970), pp. 23–26.Google Scholar

Golub, G. and Van Loan, C.. Matrix computations. Johns Hopkins University Press, 1996.Google Scholar

Gómez-Bombarelli, R. et al. “Automatic chemical design using a data-driven continuous representation of molecules”. arXiv:1610.02415 [cs.LG] (2016).Google Scholar

González, J., Osborne, M. A., and Lawrence, N. D.. “GLASSES: Relieving The Myopia Of Bayesian Optimisation”. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 51. JMLR Workshop and Conference Proceedings. JMLR.org, 2016, pp. 790–799.Google Scholar

González, J. et al. “Batch Bayesian Optimization via Local Penalization”. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 51. JMLR Workshop and Conference Proceedings. JMLR.org, 2016, pp. 648–657.Google Scholar

Goodfellow, I. J., Bengio, Y., and Courville, A.. Deep Learning. MIT Press, 2016.Google Scholar

Gradshteyn, I. and Ryzhik, I.. Table of Integrals, Series, and Products. 7th edition. Academic Press, 2007.Google Scholar

Grcar, J. “Mathematicians of Gaussian elimination”. Notices of the AMS 58.6 (2011), pp. 782–792.Google Scholar

Greenbaum, A. Iterative Methods for Solving Linear Systems. Vol. 17. SIAM, 1997.Google Scholar

Greenstadt, J. “Variations on variable-metric methods”. Mathematics of Computation 24 (1970), pp. 1–22.Google Scholar

Grewal, M. S. and Andrews, A. P.. Kalman Filtering: Theory and Practice Using MATLAB. John Wiley & Sons, Inc., 2001.Google Scholar

Griewank, A. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. SIAM, 2000.Google Scholar

Griewank, A. and Walther, A.. Evaluating Derivatives. Cambridge University Press, 2008.Google Scholar

Gunter, T. et al. “Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature”. Advances in Neural Information Processing Systems, NeurIPS. 2014, pp. 2789–2797.Google Scholar

Hager, W. “Updating the Inverse of a Matrix”. SIAM Review 31.2 (1989), pp. 221–239.Google Scholar

Hairer, E., Lubich, C., and Wanner, G.. Geometric numerical integration: structure-preserving algorithms for ordinary differential equations. Vol. 31. Springer Science & Business Media, 2006.Google Scholar

Hairer, E., Nørsett, S., and Wanner, G.. Solving Ordinary Differential Equations I – Nonstiff Problems. 2nd edition. Vol. 8. Springer Series in Computational Mathematics. Springer, 1993.Google Scholar

Hairer, E. and Wanner, G.. Solving Ordinary Differential Equations II – Stiff and Differential-Algebraic Problems. 2nd edition. Vol. 14. Springer, 1996.Google Scholar

Hartikainen, J. and Särkkä, S.. “Kalman filtering and smoothing solutions to temporal Gaussian process regression models”. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2010. 2010, pp. 379–384.Google Scholar

Helmert, F. “Über die Berechnung des wahrscheinlichen Fehlers aus einer endlichen Anzahl wahrer Beobachtungsfehler”. Zeitschrift für Mathematik und Physik 20 (1875), pp. 300–303.Google Scholar

Henderson, P. et al. “Deep Reinforcement Learning That Matters”. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 2018, pp. 3207–3214.Google Scholar

Hennig, P. “Fast Probabilistic Optimization from Noisy Gradients”. Proceedings of the 30th International Conference on Machine Learning, ICML. Vol. 28. JMLR Workshop and Conference Proceedings. JMLR.org, 2013, pp. 62–70.Google Scholar

Hennig, P. “Probabilistic Interpretation of Linear Solvers”. SIAM Journal on Optimization (2015), pp. 210–233.Google Scholar

Hennig, P. and Hauberg, S.. “Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics”. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 33. JMLR Workshop and Conference Proceedings. JMLR.org, 2014, pp. 347–355.Google Scholar

Hennig, P. and Kiefel, M.. “Quasi-Newton methods: A new direction”. International Conference on Machine Learning, ICML. 2012.Google Scholar

Hennig, P., Osborne, M., and Girolami, M.. “Probabilistic numerics and uncertainty in computations”. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 471.2179 (2015).Google Scholar

Hennig, P. and Schuler, C.. “Entropy search for information-efficient global optimization”. Journal of Machine Learning Research 13.Jun (2012), pp. 1809–1837.Google Scholar

Hernández-Lobato, J. M. et al. “Predictive Entropy Search for Bayesian Optimization with Unknown Constraints”. Proceedings of the 32nd International Conference on Machine Learning, ICML. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org, 2015, pp. 1699–1707.Google Scholar

Hestenes, M. and Stiefel, E.. “Methods of conjugate gradients for solving linear systems”. Journal of Research of the National Bureau of Standards 49.6 (1952), pp. 409–436.Google Scholar

Hinton, G. “A practical guide to training restricted Boltzmann machines”. Neural Networks: Tricks of the Trade. Springer, 2012, pp. 599–619.Google Scholar

Hoffman, M., Brochu, E., and de Freitas, N.. “Portfolio Allocation for Bayesian Optimization”. UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2011, pp. 327–336.Google Scholar

Hoffman, M. and Ghahramani, Z.. “Output-Space Predictive Entropy Search for Flexible Global Optimization”. NIPS workshop on Bayesian optimization. 2015.Google Scholar

Hoos, H. H. “Programming by optimization”. Communications of the ACM 55.2 (2012), pp. 70–80.Google Scholar

Horst, R. and Tuy, H.. Global optimization: Deterministic approaches. Springer Science & Business Media, 2013.Google Scholar

Houlsby, N. et al. “Bayesian Active Learning for Classification and Preference Learning”. arXiv:1112.5745 [stat.ML] (2011).Google Scholar

Hull, T. et al. “Comparing numerical methods for ordinary differential equations”. SIAM Journal on Numerical Analysis 9.4 (1972), pp. 603–637.Google Scholar

Huszar, F. and Duvenaud, D.. “Optimally-Weighted Herding is Bayesian Quadrature”. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2012, pp. 377–386.Google Scholar

Hutter, F., Hoos, H., and Leyton-Brown, K.. “Sequential Model-Based Optimization for General Algorithm Configuration”. Proceedings of LION-5. 2011, pp. 507–523.Google Scholar

Hutter, M. Universal Artificial Intelligence. Texts in Theoretical Computer Science. Springer, 2010.Google Scholar

Ibragimov, I. and Has’minskii, R.. Statistical Estimation: Asymptotic Theory. Springer, New York, 1981.Google Scholar

Ipsen, I. Numerical matrix analysis: Linear systems and least squares. SIAM, 2009.Google Scholar

Islam, R. et al. “Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control”. arXiv:1708.04133 [cs.LG] (2017).Google Scholar

Isserlis, L. “On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables”. Biometrika 12.1/2 (1918), pp. 134–139.Google Scholar

Jaynes, E. and Bretthorst, G.. Probability Theory: the Logic of Science. Cambridge University Press, 2003.Google Scholar

Jiang, S. et al. “Efficient Nonmyopic Active Search”. Proceedings of the 34th International Conference on Machine Learning, ICML. Vol. 70. Proceedings of Machine Learning Research. PMLR, 2017, pp. 1714–1723.Google Scholar

Jiang, S. et al. “Efficient nonmyopic Bayesian optimization and quadrature”. arXiv:1909.04568 [cs.LG] (2019).Google Scholar

Jiang, S. “BINOCULARS for efficient, nonmyopic sequential experimental design”. Proceedings of the 37th International Conference on Machine Learning, ICML. Vol. 119. Proceedings of Machine Learning Research. PMLR, 2020, pp. 4794–4803.Google Scholar

John, D., Heuveline, V., and Schober, M.. “GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver”. Proceedings of the 36th International Conference on Machine Learning, ICML. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 3152–3162.Google Scholar

Jones, D. “A taxonomy of global optimization methods based on response surfaces”. Journal of Global Optimization 21.4 (2001), pp. 345–383.Google Scholar

Jones, D., Schonlau, M., and Welch, W.. “Efficient global optimization of expensive black-box functions”. Journal of Global Optimization 13.4 (1998), pp. 455–492.Google Scholar

Kadane, J. B. and Wasilkowski, G. W.. “Average case epsilon-complexity in computer science: A Bayesian view”. Bayesian Statistics 2, Proceedings of the Second Valencia International Meeting. 1985, pp. 361–374.Google Scholar

Kálmán, R. “A New Approach to Linear Filtering and Prediction Problems”. Journal of Fluids Engineering 82.1 (1960), pp. 35–45.Google Scholar

Kanagawa, M. and Hennig, H.. “Convergence Guarantees for Adaptive Bayesian Quadrature Methods”. Advances in Neural Information Processing Systems, NeurIPS. 2019, pp. 6234–6245.Google Scholar

Kanagawa, M., Sriperumbudur, B. K., and Fukumizu, K.. “Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings”. Foundations of Computational Mathematics 20.1 (2020), pp. 155–194.Google Scholar

Kanagawa, M. et al. “Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences”. arXiv:1807.02582 [stat.ML] (2018).Google Scholar

Karatzas, I. and Shreve, S. E.. Brownian Motion and Stochastic Calculus. Springer, 1991.Google Scholar

Karvonen, T. and Särkkä, S.. “Classical quadrature rules via Gaussian processes”. IEEE International Workshop on Machine Learning for Signal Processing (MLSP). Vol. 27. 2017.Google Scholar

Kennedy, M. “Bayesian quadrature with non-normal approximating functions”. Statistics and Computing 8.4 (1998), pp. 365–375.Google Scholar

Kersting, H. “Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering”. PhD thesis. Eberhard Karls Universität Tübingen, 2020.Google Scholar

Kersting, H. and Hennig, P.. “Active Uncertainty Calibration in Bayesian ODE Solvers”. Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2016.Google Scholar

Kersting, H. and Mahsereci, M.. “A Fourier State Space Model for Bayesian ODE Filters”. Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, ICML. 2020.Google Scholar

Kersting, H., Sullivan, T. J., and Hennig, P.. “Convergence Rates of Gaussian ODE Filters”. Statistics and Computing 30.6 (2020), pp. 1791–1816.Google Scholar

Kersting, H. et al. “Differentiable Likelihoods for Fast Inversion of ‘Likelihood-Free’ Dynamical Systems”. Proceedings of the 37th International Conference on Machine Learning, ICML. Vol. 119. Proceedings of Machine Learning Research. PMLR, 2020, pp. 5198–5208.Google Scholar

Kimeldorf, G. S. and Wahba, G.. “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines”. The Annals of Mathematical Statistics 41.2 (1970), pp. 495–502.Google Scholar

Kingma, D. P. and Ba, J.. “Adam: A Method for Stochastic Optimization”. 3rd International Conference on Learning Representations, ICLR. 2015.Google Scholar

Kitagawa, G. “Non-Gaussian State—Space Modeling of Nonstationary Time Series”. Journal of the American Statistical Association 82.400 (1987), pp. 1032–1041.Google Scholar

Klein, A. et al. “Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets”. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 54. Proceedings of Machine Learning Research. PMLR, 2017, pp. 528–536.Google Scholar

Ko, C.-W., Lee, J., and Queyranne, M.. “An exact algorithm for maximum entropy sampling”. Operations Research 43.4 (1995), pp. 684–691.Google Scholar

Kochenderfer, M. J. Decision Making Under Uncertainty: Theory and Application. The MIT Press, 2015.Google Scholar

Koller, D. and Friedman, N.. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.Google Scholar

Kolmogorov, A. “Zur Theorie der Markoffschen Ketten”. Mathematische Annalen 112 (1 1936), pp. 155–160.Google Scholar

Kolmogorov, A. “Three approaches to the quantitative definition of information”. International Journal of Computer Mathematics 2.1-4 (1968), pp. 157–168.Google Scholar

Krämer, N. and Hennig, P.. “Stable Implementation of Probabilistic ODE Solvers”. arXiv:2012.10106 [stat.ML] (2020).Google Scholar

Krämer, N. and Hennig, P.. “Linear-Time Probabilistic Solutions of Boundary Value Problems”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar

Krämer, N., Schmidt, J., and Hennig, P.. “Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations”. Artificial Intelligence and Statistics (AISTATS). 2022.Google Scholar

Krämer, N. et al. “Probabilistic ODE Solutions in Millions of Dimensions”. arXiv:2110.11812 [stat.ML] (2021).Google Scholar

Krizhevsky, A. and Hinton, G.. Learning multiple layers of features from tiny images. Tech. rep. 2009.Google Scholar

Kushner, H. J. “A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise”. Journal of Basic Engineering 86.1 (1964), pp. 97–106.Google Scholar

Kushner, H. J. “A versatile stochastic model of a function of unknown and time varying form”. Journal of Mathematical Analysis and Applications 5.1 (1962), pp. 150–167.Google Scholar

Lai, T. L. and Robbins, H.. “Asymptotically efficient adaptive allocation rules”. Advances in Applied Mathematics 6.1 (1985), pp. 4–22.Google Scholar

Lanczos, C. “An iteration method for the solution of the eigen-value problem of linear differential and integral operators”. Journal of Research of the National Bureau of Standards 45 (1950), pp. 255–282.Google Scholar

Laplace, P. Théorie Analytique des Probabilités. 2nd edition. V. Courcier, Paris, 1814.Google Scholar

Larkin, F. “Gaussian measure in Hilbert space and applications in numerical analysis”. Journal of Mathematics 2.3 (1972).Google Scholar

Lauritzen, S. “Time series analysis in 1880: A discussion of contributions made by TN Thiele”. International Statistical Review/Revue Internationale de Statistique (1981), pp. 319–331.Google Scholar

Lauritzen, S. and Spiegelhalter, D.. “Local computations with probabilities on graphical structures and their application to expert systems”. Journal of the Royal Statistical Society. Series B (Methodological) 50 (1988), pp. 157–224.Google Scholar

Le Cam, L. “Convergence of estimates under dimensionality restrictions”. Annals of Statistics 1 (1973), pp. 38–53.Google Scholar

Lecomte, C. “Exact statistics of systems with uncertainties: An analytical theory of rank-one stochastic dynamic systems”. Journal of Sound and Vibration 332.11 (2013), pp. 2750–2776.Google Scholar

Lemaréchal, C. “Cauchy and the Gradient Method”. Documenta Mathematica Extra Volume: Optimization Stories (2012), pp. 251–254.Google Scholar

Lemieux, C. Monte Carlo and quasi-Monte Carlo sampling. Springer Science & Business Media, 2009.Google Scholar

Lie, H. C., Stahn, M., and Sullivan, T. J.. “Randomised one-step time integration methods for deterministic operator differential equations”. Calcolo 59.1 (2022), p. 13.Google Scholar

Lie, H. C., Stuart, A. M., and Sullivan, T. J.. “Strong convergence rates of probabilistic integrators for ordinary differential equations”. Statistics and Computing 29.6 (2019), pp. 1265–1283.Google Scholar

Lie, H. C., Sullivan, T. J., and Teckentrup, A. L.. “Random Forward Models and Log-Likelihoods in Bayesian Inverse Problems”. SIAM/ASA Journal on Uncertainty Quantification 6.4 (2018), pp. 1600–1629.Google Scholar

Lindström, E., Madsen, H., and Nielsen, J. N.. Statistics for Finance. Texts in Statistical Science. Chapman and Hall/CRC, 2015.Google Scholar

Lorenz, E. N. “Deterministic Nonperiodic Flow”. Journal of Atmospheric Sciences 20 (2 1963), pp. 130–141.Google Scholar

Loveland, D. “A new interpretation of the von Mises’ concept of random sequence”. Mathematical Logic Quarterly 12.1 (1966), pp. 279–294.Google Scholar

Luenberger, D. Introduction to Linear and Nonlinear Programming. 2nd edition. Addison Wesley, 1984.Google Scholar

Lütkepohl, H. Handbook of Matrices. Wiley, 1996.Google Scholar

MacKay, D. “The evidence framework applied to classification networks”. Neural computation 4.5 (1992), pp. 720–736.Google Scholar

MacKay, D. “Introduction to Gaussian processes”. NATO ASI Series F Computer and Systems Sciences 168 (1998), pp. 133–166.Google Scholar

MacKay, D. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.Google Scholar

MacKay, D. The Humble Gaussian Distribution. Tech. rep. Cavendish Laboratory, Cambridge University, 2006.Google Scholar

Magnani, E. et al. “Bayesian Filtering for ODEs with Bounded Derivatives”. arXiv:1709.08471 [cs.NA] (2017).Google Scholar

Mahsereci, M. “Probabilistic Approaches to Stochastic Optimization”. PhD thesis. Eberhard Karl University of Tübingen, 2018.Google Scholar

Mahsereci, M. and Hennig, P.. “Probabilistic Line Searches for Stochastic Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 181–189.Google Scholar

Mahsereci, M. and Hennig, P.. “Probabilistic Line Searches for Stochastic Optimization”. Journal of Machine Learning Research 18.119 (2017), pp. 1–59.Google Scholar

Mahsereci, M. et al. “Early Stopping without a Validation Set”. arXiv:1703.09580 [cs.LG] (2017).Google Scholar

Mania, H., Guy, A., and Recht, B.. “Simple random search provides a competitive approach to reinforcement learning”. arXiv:1803.07055 [cs.LG] (2018).Google Scholar

Marchant, R. and Ramos, F.. “Bayesian Optimisation for Intelligent Environmental Monitoring”. NIPS workshop on Bayesian Optimization and Decision Making. 2012.Google Scholar

Marchant, R., Ramos, F., and Sanner, S.. “Sequential Bayesian Optimisation for Spatial-Temporal Monitoring”. Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI. AUAI Press, 2014, pp. 553–562.Google Scholar

Markov, A. “Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga (A generalization of the law of large numbers to variables that depend on each other)”. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete (Proceedings of the Society for Physics and Mathematics at Kazan University) 15.135-156 (1906), p. 18.Google Scholar

Marmin, S., Chevalier, C., and Ginsbourger, D.. “Differentiating the multipoint Expected Improvement for optimal batch design”. arXiv:1503.05509 [stat.ML] (2015).Google Scholar

Marmin, S., Chevalier, C., and Ginsbourger, D.. “Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors”. arXiv:1609.02700 [stat.ML] (2016).Google Scholar

Martinez, R., H. J. “Local and Superlinear Convergence of Structured Secant Methods from the Convex Class”. PhD thesis. Rice University, 1988.Google Scholar

Matérn, B. “Spatial variation”. Meddelanden fran Statens Skogsforskningsinstitut 49.5 (1960).Google Scholar

Matsuda, T. and Miyatake, Y.. “Estimation of Ordinary Differential Equation Models with Discretization Error Quantification”. SIAM/ASA Journal on Uncertainty Quantification 9.1 (2021), pp. 302–331.Google Scholar

Matsumoto, M. and Nishimura, T.. “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator”. ACM Transactions on Modeling and Computer Simulation (TOMACS) 8.1 (1998), pp. 3–30.Google Scholar

McLeod, M., Osborne, M. A., and Roberts, S. J.. “Practical Bayesian Optimization for Variable Cost Objectives”. arXiv:1703.04335 [stat.ML] (2015).Google Scholar

Meister, A. Numerik Linearer Gleichungssysteme. Springer, 2011.Google Scholar

Minka, T. Deriving quadrature rules from Gaussian processes. Tech. rep. Statistics Department, Carnegie Mellon University, 2000.Google Scholar

Mitchell, M. An introduction to genetic algorithms. MIT press, 1998.Google Scholar

Mitchell, T. M. The Need for Biases in Learning Generalizations. Tech. rep. CBM-TR 5-110. Rutgers University, 1980.Google Scholar

Mockus, J., Tiesis, V., and Žilinskas, A.. “The Application of Bayesian Methods for Seeking the Extremum”. Toward Global Optimization. Vol. 2. Elsevier, 1978.Google Scholar

Moore, E. “On the reciprocal of the general algebraic matrix, abstract”. Bulletin of American Mathematical Society 26 (1920), pp. 394–395.Google Scholar

Moré, J. J. “Recent developments in algorithms and software for trust region methods”. Mathematical Programming: The state of the art. 1983, pp. 258–287.Google Scholar

Neal, R. “Annealed importance sampling”. Statistics and Computing 11.2 (2001), pp. 125–139.Google Scholar

Nesterov, Y. “A method of solving a convex programming problem with convergence rate ”. Soviet Mathematics Doklady. Vol. 27. 2. 1983, pp. 372–376.Google Scholar

Netzer, Y. et al. “Reading digits in natural images with unsupervised feature learning”. NIPS workshop on deep learning and unsupervised feature learning. 2. 2011, p. 5.Google Scholar

Nickson, T. et al. “Automated Machine Learning on Big Data using Stochastic Algorithm Tuning”. arXiv:1407.7969 [stat.ML] (2014).Google Scholar

Nocedal, J. and Wright, S.. Numerical Optimization. Springer Verlag, 1999.Google Scholar

Nordsieck, A. “On numerical integration of ordinary differential equations”. Mathematics of Computation 16.77 (1962), pp. 22–49.Google Scholar

Novak, E. Deterministic and stochastic error bounds in numerical analysis. Vol. 1349. Springer, 2006.Google Scholar

Nyström, E. “Über die praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben”. Acta Mathematica 54.1 (1930), pp. 185–204.Google Scholar

O’Hagan, A. “Bayes–Hermite quadrature”. Journal of Statistical Planning and Inference (1991), pp. 245–260.Google Scholar

O’Hagan, A. “Some Bayesian Numerical Analysis”. Bayesian Statistics (1992), pp. 345–363.Google Scholar

O’Hagan, A. and Kingman, J. F. C.. “Curve Fitting and Optimal Design for Prediction”. Journal of the Royal Statistical Society. Series B 40.1 (1978), pp. 1–42.Google Scholar

O’Neil, C. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown, 2016.Google Scholar

Oates, C. J. and Sullivan, T. J.. “A Modern Retrospective on Probabilistic Numerics”. Statistics and Computing 29.6 (2019), pp. 1335–1351.Google Scholar

Oates, C. J., Girolami, M., and Chopin, N.. “Control functionals for Monte Carlo integration”. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79.3 (2017), pp. 695–718.Google Scholar

Oates, C. J. et al. “Convergence rates for a class of estimators based on Stein’s method”. Bernoulli 25.2 (2019a), pp. 1141–1159.Google Scholar

Oates, C. J. et al. “Bayesian Probabilistic Numerical Methods in Time-Dependent State Estimation for Industrial Hydrocyclone Equipment”. Journal of the American Statistical Association 114.528 (2019b), pp. 1518–1531.Google Scholar

Oesterle, J. et al. “Numerical uncertainty can critically affect simulations of mechanistic models in neuroscience”. bioRxiv (2021).Google Scholar

Oksendal, B. Stochastic Differential Equations: An Introduction with Applications. 6th edition. Springer, 2003.Google Scholar

Ortega, J. and Rheinboldt, W.. Iterative solution of nonlinear equations in several variables. Vol. 30. Classics in Applied Mathematics. SIAM, 1970.Google Scholar

Osborne, M., Garnett, R., and Roberts, S.. “Gaussian processes for global optimization”. 3rd International Conference on Learning and Intelligent Optimization (LION3). 2009.Google Scholar

Osborne, M. A. et al. “Active Learning of Model Evidence Using Bayesian Quadrature”. Advances in Neural Information Processing Systems, NeurIPS. 2012, pp. 46–54.Google Scholar

Owhadi, H. “Multigrid with Rough Coefficients and Multiresolution Operator Decomposition from Hierarchical Information Games”. SIAM Review 59.1 (2017), pp. 99–149.Google Scholar

Owhadi, H. and Scovel, C.. “Conditioning Gaussian measure on Hilbert space”. arXiv:1506.04208 [math.PR] (2015).Google Scholar

Owhadi, H. and Scovel, C.. “Toward Machine Wald”. Springer Handbook of Uncertainty Quantification. Springer, 2016, pp. 1–35.Google Scholar

Owhadi, H. and Zhang, L.. “Gamblets for opening the complexity-bottleneck of implicit schemes for hyperbolic and parabolic ODEs/PDEs with rough coefficients”. Journal of Computational Physics 347 (2017), pp. 99–128.Google Scholar

Packel, E. and Traub, J.. “Information-based complexity”. Nature 328.6125 (1987), pp. 29–33.Google Scholar

Paleyes, A. et al. “Emulation of physical processes with Emukit”. Second Workshop on Machine Learning and the Physical Sciences, NeurIPS. 2019.Google Scholar

Parlett, B. The Symmetric Eigenvalue Problem. Prentice-Hall, 1980.Google Scholar

Pasquale, F. The black box society: The secret algorithms that control money and information. Harvard University Press, 2015.Google Scholar

Patterson, D. “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink” (2022).Google Scholar

Paul, S., Osborne, M. A., and Whiteson, S.. “Fingerprint Policy Optimisation for Robust Reinforcement Learning”. Proceedings of the 36th International Conference on Machine Learning, ICML. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 5082–5091.Google Scholar

Pearl, J. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.Google Scholar

Pitman, E. “Sufficient statistics and intrinsic accuracy”. Mathematical Proceedings of the Cambridge Philosophical society. Vol. 32. 04. 1936, pp. 567–579.Google Scholar

Poincaré, H. Calcul des Probabilités. Gauthier-Villars, 1896.Google Scholar

Polyak, B. T. “Some methods of speeding up the convergence of iteration methods”. USSR Computational Mathematics and Mathematical Physics 4.5 (1964), pp. 1–17.Google Scholar

Powell, M. J. D. “A new algorithm for unconstrained optimization”. Nonlinear Programming. AP, 1970.Google Scholar

Powell, M. J. D. “Convergence properties of a class of minimization algorithms”. Nonlinear programming 2. 1975, pp. 1–27.Google Scholar

Press, W. et al. Numerical Recipes in Fortran 77: The Art of Scientific Computing. Cambridge University Press, 1992.Google Scholar

Quiñonero-Candela, J. and Rasmussen, C.. “A unifying view of sparse approximate Gaussian process regression”. Journal of Machine Learning Research 6 (2005), pp. 1939–1959.Google Scholar

Rackauckas, C. et al. “A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions”. arXiv:1812.01892 [math.NA] (2018).Google Scholar

Rahimi, A. and Recht, B.. “Random Features for Large-Scale Kernel Machines”. Advances in Neural Information Processing Systems, NeurIPS. Curran Associates, Inc., 2007, pp. 1177– 1184.Google Scholar

Raissi, M., Perdikaris, P., and Karniadakis, G. E.. “Machine learning of linear differential equations using Gaussian processes”. Journal of Computational Physics 348 (2017), pp. 683–693.Google Scholar

Rasmussen, C. and Williams, C.. Gaussian Processes for Machine Learning. MIT, 2006.Google Scholar

Rauch, H., Striebel, C., and Tung, F.. “Maximum likelihood estimates of linear dynamic systems”. Journal of the American Institute of Aeronautics and Astronautics (AAIA) 3.8 (1965), pp. 1445–1450.Google Scholar

Reid, W. Riccati differential equations. Elsevier, 1972.Google Scholar

Riccati, J. “Animadversiones in aequationes differentiales secundi gradus”. Actorum Eruditorum Supplementa 8 (1724), pp. 66–73.Google Scholar

Ritter, K. Average-case analysis of numerical problems. Lecture Notes in Mathematics 1733. Springer, 2000.Google Scholar

Robert, C. and Casella, G.. Monte Carlo Statistical Methods. Springer Science & Business Media, 2013.Google Scholar

Rontsis, N., Osborne, M. A., and Goulart, P. J.. “Distributionally Robust Optimization Techniques in Batch Bayesian Optimization”. Journal of Machine Learning Research 21.149 (2020), pp. 1–26.Google Scholar

Saad, Y. Iterative Methods for Sparse Linear Systems. SIAM, 2003.Google Scholar

Saad, Y. and Schultz, M.. “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems”. SIAM Journal on scientific and statistical computing 7.3 (1986), pp. 856–869.Google Scholar

Sacks, J. and Ylvisaker, D.. “Statistical designs and integral approximation”. Proc. 12th Bienn. Semin. Can. Math. Congr. 1970, pp. 115–136.Google Scholar

Sacks, J. and Ylvisaker, D.. “Model robust design in regression: Bayes theory”. Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. Vol. 2. 1985, pp. 667–679.Google Scholar

Sard, A. Linear approximation. American Mathematical Society, 1963.Google Scholar

Särkkä, S. Bayesian filtering and smoothing. Cambridge University Press, 2013.Google Scholar

Särkkä, S. and Solin, A.. Applied Stochastic Differential Equations. Cambridge University Press, 2019.Google Scholar

Särkkä, S., Solin, A., and Hartikainen, J.. “Spatiotemporal learning via infinite-dimensional Bayesian filtering and smoothing: A look at Gaussian process regression through Kalman filtering”. IEEE Signal Processing Magazine 30.4 (2013), pp. 51–61.Google Scholar

Schmidt, J., Krämer, N., and Hennig, P.. “A Probabilistic State Space Model for Joint Inference from Differential Equations and Data”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar

Schmidt, R. M., Schneider, F., and Hennig, P.. “Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers”. Proceedings of the 38th International Conference on Machine Learning. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 9367–9376.Google Scholar

Schneider, F., Dangel, F., and Hennig, P.. “Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks”. Advances in Neural Information Processing Systems, NeurIPS. 2021.Google Scholar

Schober, M., Duvenaud, D., and Hennig, P.. “Probabilistic ODE Solvers with Runge-Kutta Means”. Advances in Neural Information Processing Systems, NeurIPS. 2014, pp. 739–747.Google Scholar

Schober, M., Särkkä, S., and Hennig, P.. “A probabilistic model for the numerical solution of initial value problems”. Statistics and Computing 29.1 (2019), pp. 99–122.Google Scholar

Schober, M. et al. “Probabilistic shortest path tractography in DTI using Gaussian Process ODE solvers”. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014. Springer, 2014.Google Scholar

Schölkopf, B. “The Kernel Trick for Distances”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2000, pp. 301–307.Google Scholar

Schölkopf, B. and Smola, A.. Learning with Kernels. MIT Press, 2002.Google Scholar

Schur, I. “Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind.” Journal für die reine und angewandte Mathematik 147 (1917), pp. 205–232.Google Scholar

Schwartz, R. et al. “Green AI”. arXiv:1907.10597 [cs.CY] (2019).Google Scholar

Scieur, D. et al. “Integration Methods and Optimization Algorithms”. Advances in Neural Information Processing Systems, NeurIPS. 2017, pp. 1109–1118.Google Scholar

Shah, A. and Ghahramani, Z.. “Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions”. Advances in Neural Information Processing Systems, NeurIPS. 2015, pp. 3330–3338.Google Scholar

Shahriari, B. et al. “An Entropy Search Portfolio for Bayesian Optimization”. arXiv:1406.4625 [stat.ML] (2014).Google Scholar

Shahriari, B. et al. “Taking the human out of the loop: A review of Bayesian optimization”. Proceedings of the IEEE 104.1 (2016), pp. 148–175.Google Scholar

Shanno, D. “Conditioning of quasi-Newton methods for function minimization”. Mathematics of Computation 24.111 (1970), pp. 647–656.Google Scholar

Skeel, R. “Equivalent Forms of Multistep Formulas”. Mathematics of Computation 33 (1979).Google Scholar

Skilling, J. “Bayesian solution of ordinary differential equations”. Maximum Entropy and Bayesian Methods (1991).Google Scholar

Smola, A. et al. “A Hilbert space embedding for distributions”. International Conference on Algorithmic Learning Theory. 2007, pp. 13–31.Google Scholar

Snelson, E. and Ghahramani, Z.. “Sparse Gaussian Processes using Pseudo-inputs”. Advances in Neural Information Processing Systems, NeurIPS. 2005, pp. 1257–1264.Google Scholar

Snoek, J., Larochelle, H., and Adams, R. P.. “Practical Bayesian Optimization of Machine Learning Algorithms”. Advances in Neural Information Processing Systems, NeurIPS. 2012, pp. 2960–2968.Google Scholar

Snoek, J. et al. “Scalable Bayesian Optimization Using Deep Neural Networks”. Proceedings of the 32nd International Conference on Machine Learning, ICML. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org, 2015, pp. 2171–2180.Google Scholar

Solak, E. et al. “Derivative Observations in Gaussian Process Models of Dynamic Systems”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2002, pp. 1033–1040.Google Scholar

Solin, A. and Särkkä, S.. “Explicit Link Between Periodic Covariance Functions and State Space Models”. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS. Vol. 33. JMLR Workshop and Conference Proceedings. JMLR.org, 2014, pp. 904–912.Google Scholar

Sonneveld, P. “CGS, a fast Lanczos-type solver for nonsymmetric linear systems”. SIAM Journal on Scientific and Statistical Computing 10.1 (1989), pp. 36–52.Google Scholar

Spitzbart, A. “A generalization of Hermite’s Interpolation Formula”. The American Mathematical Monthly 67.1 (1960), pp. 42–46.Google Scholar

Springenberg, J. T. et al. “Bayesian Optimization with Robust Bayesian Neural Networks”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 4134–4142.Google Scholar

Srinivas, N. et al. “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design”. Proceedings of the 27th International Conference on Machine Learning, ICML. Omnipress, 2010, pp. 1015–1022.Google Scholar

Stein, M. Interpolation of spatial data: some theory for Kriging. Springer Verlag, 1999.Google Scholar

Steinwart, I. “Convergence Types and Rates in Generic Karhunen-Loève Expansions with Applications to Sample Path Properties”. Potential Analysis 51.3 (2019), pp. 361–395.Google Scholar

Steinwart, I. and Christmann, A.. Support Vector Machines. Springer Science & Business Media, 2008.Google Scholar

Streltsov, S. and Vakili, P.. “A Non-myopic Utility Function for Statistical Global Optimization Algorithms”. Journal of Global Optimization 14.3 (1999), pp. 283–298.Google Scholar

Student. “The probable error of a mean”. Biometrika 6 (1 1908), pp. 1–25.Google Scholar

Sul’din, A. V. “Wiener measure and its applications to approximation methods. I”. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika 6 (1959), pp. 145–158.Google Scholar

Sul’din, A. V. “Wiener measure and its applications to approximation methods. II”. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika 5 (1960), pp. 165–179.Google Scholar

Sullivan, T. Introduction to uncertainty quantification. Vol. 63. Texts in Applied Mathematics. Springer, 2015.Google Scholar

Sutton, R. and Barto, A.. Reinforcement Learning. MIT Press, 1998.Google Scholar

Swersky, K., Snoek, J., and Adams, R. P.. “Freeze-Thaw Bayesian Optimization”. arXiv:1406.3896 [stat.ML] (2014).Google Scholar

Swersky, K. et al. “Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces”. NIPS workshop on Bayesian Optimization in theory and practice (BayesOpt’13). 2013.Google Scholar

Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, 2005.Google Scholar

Teschl, G. Ordinary Differential Equations and Dynamical Systems. Vol. 140. Graduate Studies in Mathematics. American Mathematical Society, 2012.Google Scholar

Teymur, O., Zygalakis, K., and Calderhead, B.. “Probabilistic Linear Multistep Methods”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 4314–4321.Google Scholar

Teymur, O. et al. “Implicit Probabilistic Integrators for ODEs”. Advances in Neural Information Processing Systems, NeurIPS. 2018, pp. 7255–7264.Google Scholar

Thaler, R. H. “Anomalies: The Winner’s Curse”. Journal of Economic Perspectives 2.1 (1988), pp. 191–202.Google Scholar

Thiele, T. “Om Anvendelse af mindste Kvadraters Methode i nogle Tilfælde, hvor en Komplikation af visse Slags uensartede tilfældige Fejlkilder giver Fejlene en ‘systematisk’ Karakter”. Det Kongelige Danske Videnskabernes Selskabs Skrifter-Naturvidenskabelig og Mathematisk Afdeling (1880), pp. 381–408.Google Scholar

Traub, J., Wasilkowski, G., and Woźniakowski, H.. Information, Uncertainty, Complexity. Addison-Wesley Publishing Company, 1983.Google Scholar

Trefethen, L. and Bau, D. III. Numerical Linear Algebra. SIAM, 1997.Google Scholar

Tronarp, F., Bosch, N., and Hennig, P.. “Fenrir: Physics-Enhanced Regression for Initial Value Problems”. arXiv:2202.01287 [cs.LG] (2022).Google Scholar

Tronarp, F., Särkkä, S., and Hennig, P.. “Bayesian ODE solvers: The maximum a posteriori estimate”. Statistics and Computing 31.3 (2021), pp. 1–18.Google Scholar

Tronarp, F. et al. “Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective”. Statistics and Computing 29.6 (2019), pp. 1297–1315.Google Scholar

Turing, A. “Rounding-off errors in matrix processes”. Quarterly Journal of Mechanics and Applied Mathematics 1.1 (1948), pp. 287–308.Google Scholar

Uhlenbeck, G. and Ornstein, L.. “On the theory of the Brownian motion”. Physical Review 36.5 (1930), p. 823.Google Scholar

van Loan, C. “The ubiquitous Kronecker product”. Journal of Computational and Applied Mathematics 123 (2000), pp. 85–100.Google Scholar

Vijayakumar, S. and Schaal, S.. “Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space”. Proceedings of the Seventeenth International Conference on Machine Learning, ICML. Morgan Kaufmann, 2000, pp. 1079–1086.Google Scholar

Villemonteix, J., Vazquez, E., and Walter, E.. “An informational approach to the global optimization of expensive-to-evaluate functions”. Journal of Global Optimization 44.4 (2009), pp. 509–534.Google Scholar

Von Neumann, J. “Various techniques used in connection with random digits”. Monte Carlo Method. Vol. 12. National Bureau of Standards Applied Mathematics Series. 1951, pp. 36–38.Google Scholar

Wahba, G. Spline models for observational data. CBMS-NSF Regional Conferences series in applied mathematics. SIAM, 1990.Google Scholar

Wang, J., Cockayne, J., and Oates, C.. “On the Bayesian Solution of Differential Equations”. Proceedings of the 38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (2018).Google Scholar

Wang, J. et al. “Parallel Bayesian Global Optimization of Expensive Functions”. arXiv:1602.05149 [stat.ML] (2016).Google Scholar

Wang, J. et al. “Bayesian numerical methods for nonlinear partial differential equations”. Statistics and Computing 31.55 (2021).Google Scholar

Wang, Z. and Jegelka, S.. “Max-value Entropy Search for Efficient Bayesian Optimization”. Proceedings of the 34th International Conference on Machine Learning, ICML. Vol. 70. Proceedings of Machine Learning Research. PMLR, 2017, pp. 3627–3635.Google Scholar

Warth, W. and Werner, J.. “Effiziente Schrittweitenfunktionen bei unrestringierten Optimierungsaufgaben”. Computing 19.1 (1977), pp. 59–72.Google Scholar

Weise, T. “Global optimization algorithms-theory and application”. Self-Published, (2009), pp. 25–26.Google Scholar

Wendland, H. and Rieger, C.. “Approximate Interpolation with Applications to Selecting Smoothing Parameters”. Numerische Mathematik 101.4 (2005), pp. 729–748.Google Scholar

Wenger, J. and Hennig, P.. “Probabilistic Linear Solvers for Machine Learning”. Advances in Neural Information Processing Systems, NeurIPS (2020).Google Scholar

Wenger, J. et al. “ProbNum: Probabilistic Numerics in Python”. 2021.Google Scholar

Werner, J. “Über die globale Konvergenz von Variable-Metrik-Verfahren mit nicht-exakter Schrittweitenbestimmung”. Numerische Mathematik 31.3 (1978), pp. 321–334.Google Scholar

Wiener, N. “Differential space”. Journal of Mathematical Physics 2 (1923), pp. 131–174.Google Scholar

Wilkinson, J. The Algebraic Eigenvalue Problem. Oxford University Press, 1965.Google Scholar

Williams, C. K. I. and Seeger, M. W.. “Using the Nyström Method to Speed Up Kernel Machines”. Advances in Neural Information Processing Systems, NeurIPS. MIT Press, 2000, pp. 682–688.Google Scholar

Wills, A. G. and Schön, T. B.. “On the construction of probabilistic Newton-type algorithms”. IEEE Conference on Decision and Control (CDC). Vol. 56. 2017.Google Scholar

Winfield, D. H. “Function and functional optimization by interpolation in data tables”. PhD thesis. Harvard University, 1970.Google Scholar

Wishart, J. “The generalised product moment distribution in samples from a normal multivariate population”. Biometrika (1928), pp. 32–52.Google Scholar

Wolfe, P. “Convergence conditions for ascent methods”. SIAM review (1969), pp. 226–235.Google Scholar

Wu, J. and Frazier, P. I.. “The Parallel Knowledge Gradient Method for Batch Bayesian Optimization”. Advances in Neural Information Processing Systems, NeurIPS. 2016, pp. 3126–3134.Google Scholar

Wu, J. et al. “Bayesian Optimization with Gradients”. Advances in Neural Information Processing Systems, NeurIPS. 2017, pp. 5267–5278.Google Scholar

Zeiler, M. D. “ADADELTA: An Adaptive Learning Rate Method”. arXiv:1212.5701 [cs.LG] (2012).Google Scholar

Book contents

References

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive