Bibliography

Stephen J. Wright; Benjamin Recht

doi:10.1017/9781009004282.014

Bibliography

Published online by Cambridge University Press: 31 March 2022

Stephen J. Wright and

Benjamin Recht

Show author details

Stephen J. Wright: Affiliation:
University of Wisconsin, Madison
Benjamin Recht: Affiliation:
University of California, Berkeley

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Optimization for Data Analysis , pp. 216 - 222

DOI: https://doi.org/10.1017/9781009004282.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allen-Zhu, Z. 2017. Katyusha: The first direct acceleration of stochastic gradient methods. Journal of Machine Learning Research, 18(1), 8194–8244.Google Scholar

Attouch, H., Chbani, Z., Peypouquet, J., and Redont, P. 2018. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Mathematical Programming, 168(1–2), 123–175.Google Scholar

Beck, A., and Teboulle, M. 2003. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31, 167–175.Google Scholar

Beck, A., and Teboulle, M. 2009. A Fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.CrossRef Google Scholar

Beck, A., and Tetruashvili, L. 2013. On the convergence of block coordinate descent type methods. SIAM Journal on Optimization, 23(4), 2037–2060.Google Scholar

Bertsekas, D. P. 1976. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, AC-21, 174–184.Google Scholar

Bertsekas, D. P. 1982. Constrained Optimization and Lagrange Multiplier Methods. New York: Academic Press.Google Scholar

Bertsekas, D. P. 1997. A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4), 913–926.CrossRef Google Scholar

Bertsekas, D. P. 1999. Nonlinear Programming. Second edition. Belmont, MA: Athena Scientific.Google Scholar

Bertsekas, D. P. 2011. Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Pages 85–119 of: Sra, S., Nowozin, S., and Wright, S.J.(eds),Optimization for Machine Learning. NIPS Workshop Series. Cambridge, MA: MIT Press.Google Scholar

Bertsekas, D. P., and Tsitsiklis, J. N. 1989. Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice Hall.Google Scholar

Bertsekas, D. P., Nedić, A., and Ozdaglar, A. E. 2003. Convex Analysis and Optimization. Optimization and Computation Series. Belmont, MA: Athena Scientific.Google Scholar

Blatt, D., Hero, A. O., and Gauchman, H. 2007. A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1), 29–51.Google Scholar

Bolte, J., and Pauwels, E. 2021. Conservative set valued fields, automatic differentiation, stochastic gradient methods, and deep learning. Mathematical Programming, 188(1), 19–51.Google Scholar

Boser, B. E., Guyon, I. M., and Vapnik, V. N. 1992. A training algorithm for optimal margin classifiers. Pages 144–152 of: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Pittsburgh, PA: ACM Press.Google Scholar

Boyd, S., and Vandenberghe, L. 2003. Convex Optimization. Cambridge: Cambridge University Press.Google Scholar

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. 2011. Distributed optimization and statistical learning via the alternating direction methods of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.Google Scholar

Bubeck, S., Lee, Y. T., and Singh, M. 2015. A geometric alternative to Nesterov’s accelerated gradient descent. Technical Report arXiv:1506.08187. Microsoft Research.Google Scholar

Burachik, R. S., and Jeyakumar, V. 2005. A Simple closure condition for the normal cone intersection formula. Transactions of the American Mathematical Society, 133(6), 1741–1748.Google Scholar

Burer, S., and Monteiro, R. D. C. 2003. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorizations. Mathematical Programming, Series B, 95, 329–257.CrossRef Google Scholar

Burke, J. V., and Engle, A. 2018. Line search methods for convex-composite optimization. Technical Report arXiv:1806.05218. Department of Mathematics, University of Washington.Google Scholar

Candès, E., and Recht, B. 2009. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9, 717–772.CrossRef Google Scholar

Chouzenoux, E., Pesquet, J.-C., and Repetti, A. 2016. A block coordinate variable metric forward-backward algorithm. Journal of Global Optimization, 66, 457–485.Google Scholar

Conn, A. R., Gould, N. I. M., and Toint, P. L. 1992. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization. Springer Series in Computational Mathematics, vol. 17. Heidelberg: Springer-Verlag.Google Scholar

Cortes, C., and Vapnik, V. N. 1995. Support-vector networks. Machine Learning, 20, 273–297.Google Scholar

Danskin, J. M. 1967. The Theory of Max-Min and Its Application to Weapons Allocation Problems. Springer.CrossRef Google Scholar

Davis, D., Drusvyatskiy, D., Kakade, S., and Lee, J. D. 2020. Stochastic subgradient method converges on tame functions. Foundations of Computational Mathematics, 20(1), 119–154.Google Scholar

Defazio, A., Bach, F., and Lacoste-Julien, S. 2014. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Pages 1646– 1654 of: Advances in Neural Information Processing Systems, November 2014, Montreal, Canada.Google Scholar

Dem’yanov, V. F., and Rubinov, A. M. 1967. The minimization of a smooth convex functional on a convex set. SIAM Journal on Control, 5(2), 280–294.Google Scholar

Dem’yanov, V. F., and Rubinov, A. M. 1970. Approximate Methods in Optimization Problems. Vol. 32. New York: Elsevier.Google Scholar

Drusvyatskiy, D., Fazel, M., and Roy, S. 2018. An optimal first order method based on optimal quadratic averaging. SIAM Journal on Optimization, 28(1), 251–271.Google Scholar

Dunn, J. C. 1980. Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM Journal on Control and Optimization, 18(5), 473–487.Google Scholar

Dunn, J. C. 1981. Global and asymptotic convergence rate estimates for a class of projected gradient processes. SIAM Journal on Control and Optimization, 19(3), 368–400.Google Scholar

Eckstein, J., and Bertsekas, D. P. 1992. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55, 293–318.Google Scholar

Eckstein, J., and Yao, W. 2015. Understanding the convergence of the alternating direction method of multipliers: Theoretical and computational perspectives. Pacific Journal of Optimization, 11(4), 619–644.Google Scholar

Fercoq, O., and Richtarik, P. 2015. Accelerated, parallel, and proximal coordinate descent. SIAM Journal on Optimization, 25, 1997–2023.Google Scholar

Fletcher, R., and Reeves, C. M. 1964. Function minimization by conjugate gradients. Computer Journal, 7, 149–154.Google Scholar

Frank, M., and Wolfe, P. 1956. An algorithm for Quadratic Programming. Naval Research Logistics Quarterly, 3, 95–110.Google Scholar

Gabay, D., and Mercier, B. 1976. A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Computers and Mathematics with Applications, 2, 17–40.Google Scholar

Gelfand, I. 1941. Normierte ringe. Recueil Mathématique [Matematicheskii Sbornik], 9, 3–24.Google Scholar

Glowinski, R., and Marrocco, A. 1975. Sur l’approximation, par elements finis d’ordre un, en al resolution, par penalisation-dualité, d’une classe dre problems de Dirichlet non lineares. Revue Francaise d’Automatique, Informatique, et Recherche Operationelle, 9, 41–76.Google Scholar

Goldstein, A. A. 1964. Convex programming in Hilbert space. Bulletin of the American Mathematical Society, 70, 709–710.Google Scholar

Goldstein, A. A. 1974. On gradient projection. Pages 38–40 of: Proceedings of the 12th Allerton Conference on Circuit and System Theory, Allerton Park, Illinois.Google Scholar

Golub, G. H., and van Loan, C. F. 1996. Matrix Computations. Third edition. Baltimore: The Johns Hopkins University Press.Google Scholar

Griewank, A., and Walther, A. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Second edition. Frontiers in Applied Mathematics. Philadelphia, PA: SIAM.Google Scholar

Hestenes, M. R. 1969. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4, 303–320.Google Scholar

Hestenes, M., and Steifel, E. 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436.Google Scholar

Hu, B., Wright, S. J., and Lessard, L. 2018. Dissipativity theory for accelerating stochastic variance reduction: A unified analysis of SVRG and Katyusha using semidefinite programs. Pages 2038–2047 of: International Conference on Machine Learning (ICML).Google Scholar

Jaggi, M. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. Pages 427–435 of: International Conference on Machine Learning (ICML).Google Scholar

Jain, P., Netrapalli, P., Kakade, S. M., Kidambi, R., and Sidford, A. 2018. Accelerating stochastic gradient descent for least squares regression. Pages 545–604 of: Conference on Learning Theory (COLT).Google Scholar

Johnson, R., and Zhang, T. 2013. Accelerating stochastic gradient descent using predictive variance reduction. Pages 315–323 of: Advances in Neural Information Processing Systems.Google Scholar

Kaczmarz, S. 1937. Angenäherte Auflösung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise des Sciences et des Lettres. Classe des Sciences Mathématiques et Naturelles. Série A, Sciences Mathématiques, 35, 355–357.Google Scholar

Karimi, H., Nutini, J., and Schmidt, M. 2016. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. Pages 795– 811 of: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer.Google Scholar

Kiwiel, K. C. 1990. Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical Programming, 46(1–3), 105–122.Google Scholar

Kurdyka, K. 1998. On gradients of functions definable in o-minimal structures. Annales de l’Institut Fourier, 48, 769–783.Google Scholar

Lang, S. 1983. Real Analysis. Second edition. Reading, MA: Addison-Wesley.Google Scholar

Le Roux, N., Schmidt, M., and Bach, F. 2012. A stochastic gradient method with an exponential convergence rate for finite training sets. Advances in Neural Information Processing Systems, 25, 2663–2671.Google Scholar

Lee, C.-P., and Wright, S. J. 2018. Random permutations fix a worst case for cyclic coordinate descent. IMA Journal of Numerical Analysis, 39, 1246–1275.Google Scholar

Lee, Y. T., and Sidford, A. 2013. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. Pages 147–156 of: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE.Google Scholar

Lemaréchal, C. 1975. An extension of Davidon methods to non differentiable problems. Pages 95–109 of: Nondifferentiable Optimization. Springer.Google Scholar

Lemaréchal, C., Nemirovskii, A., and Nesterov, Y. 1995. New variants of bundle methods. Mathematical Programming, 69(1–3), 111–147.CrossRef Google Scholar

Lessard, L., Recht, B., and Packard, A. 2016. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization, 26(1), 57–95.Google Scholar

Levitin, E. S., and Polyak, B. T. 1966. Constrained minimization problems. USSR Journal of Computational Mathematics and Mathematical Physics, 6, 1–50.Google Scholar

Li, X., Zhao, T., Arora, R., Liu, H., and Hong, M. 2018. On Faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization. Journal of Machine Learning Research, 18, 1–24.Google Scholar

Liu, J., and Wright, S. J. 2015. Asynchronous stochastic coordinate descent: Parallelism and convergence properties. SIAM Journal on Optimization, 25(1), 351–376.CrossRef Google Scholar

Liu, J., Wright, S. J., Ré, C., Bittorf, V., and Sridhar, S. 2015. An asynchronous parallel stochastic coordinate descent algorithm. Journal of Machine Learning Research, 16, 285–322.Google Scholar

Łojasiewicz, S. 1963. Une propriété topologique des sous-ensembles analytiques réels. Les Équations aus Dérivées Partielles, 117, 87–89.Google Scholar

Lu, Z., and Xiao, L. 2015. On the complexity analysis of randomized block-coordinate descent methods. Mathematical Programming, Series A, 152, 615–642.Google Scholar

Luo, Z.-Q., Sturm, J. F., and Zhang, S. 2000. Conic convex programming and self-dual embedding. Optimization Methods and Software, 14, 169–218.Google Scholar

Maddison, C. J., Paulin, D., Teh, Y. W., O’Donoghue, B., and Doucet, A. 2018. Hamiltonian descent methods. arXiv preprint arXiv:1809.05042.Google Scholar

Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A. 2009. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4), 1574–1609.Google Scholar

Nesterov, Y. 1983. A method for unconstrained convex problem with the rate of convergence O(1/k²). Doklady AN SSSR, 269, 543–547.Google Scholar

Nesterov, Y. 2004. Introductory Lectures on Convex Optimization: A Basic Course. Boston: Kluwer Academic Publishers.Google Scholar

Nesterov, Y. 2012. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(January), 341–362.Google Scholar

Nesterov, Y. 2015. Universal gradient methods for convex optimization problems. Mathematical Programming, 152(1–2), 381–404.Google Scholar

Nesterov, Y., and Nemirovskii, A. S. 1994. Interior Point Polynomial Methods in Convex Programming. Philadelphia, PA: SIAM.Google Scholar

Nesterov, Y., and Stich, S. U. 2017. Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM Journal on Optimization, 27(1), 110–123.Google Scholar

Nocedal, J., and Wright, S. J. 2006. Numerical Optimization. Second edition. New York: Springer.Google Scholar

Parikh, N., and Boyd, S. 2013. Proximal algorithms. Foundations and Trends in Optimization, 1(3), 123–231.Google Scholar

Polyak, B. T. 1963. Gradient methods for minimizing functionals (in Russian). Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 643–653.Google Scholar

Polyak, B. T. 1964. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4, 1–17.CrossRef Google Scholar

Powell, M. J. D. 1969. A method for nonlinear constraints in minimization problems. Pages 283–298 of: Fletcher, R. (ed), Optimization. New York: Academic Press.Google Scholar

Rao, C. V., Wright, S. J., and Rawlings, J. B. 1998. Application of interior-point methods to model predictive control. Journal of Optimization Theory and Applications, 99, 723–757.Google Scholar

Recht, B., Fazel, M., and Parrilo, P. 2010. Guaranteed Minimum-rank solutions to linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.Google Scholar

Richtarik, P., and Takac, M. 2014. Iteration complexity of a randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, Series A, 144(1), 1–38.Google Scholar

Richtarik, P., and Takac, M. 2016a. Distributed coordinate descent method for learning with big data. Journal of Machine Learning Research, 17, 1–25.Google Scholar

Richtarik, P., and Takac, M. 2016b. Parallel coordinate descent methods for big data optimization. Mathematical Programming, Series A, 156, 433–484.CrossRef Google Scholar

Robbins, H., and Monro, S. 1951. A stochastic approximation method. Annals of Mathematical Statistics, 22(3), 400–407.Google Scholar

Rockafellar, R. T. 1970. Convex Analysis. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Rockafellar, R. T. 1973. The multiplier method of Hestenes and Powell applied to convex programming. Journal of Optimization Theory and Applications, 12(6), 555–562.Google Scholar

Rockafellar, R. T. 1976a. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research, 1, 97–116.Google Scholar

Rockafellar, R. T. 1976b. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14, 877–898.Google Scholar

Rosenblatt, F. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.Google Scholar

Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming, 127(1), 3–30.Google Scholar

Shi, B., Du, S. S., Jordan, M. I., and Su, W. J. 2018. Understanding the acceleration phenomenon via high-resolution differential equations. arXiv preprint arXiv:1810.08907.Google Scholar

Sion, M. 1958. On general minimax theorems. Pacific Journal of Mathematics, 8(1), 171–176.Google Scholar

Stellato, B., Banjac, G., Goulart, P., Bemporad, A., and Boyd, S. 2020. OSQP: An operator splitting solver for quadratic programs. Mathematical Programming Computation, 12(4), 637–672.CrossRef Google Scholar

Strohmer, T., and Vershynin, R. 2009. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2), 262.Google Scholar

Su, W., Boyd, S., and Candès, E. 2014. A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. Pages 2510–2518 of: Advances in Neural Information Processing Systems.Google Scholar

Sun, R., and Hong, M. 2015. Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. Pages 1306–1314 of: Advances in Neural Information Processing Systems.Google Scholar

Teo, C. H., Vishwanathan, S. V. N., Smola, A., and Le, Q. V. 2010. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11(1), 311–365.Google Scholar

Tibshirani, R. 1996. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58, 267–288.Google Scholar

Todd, M. J. 2001. Semidefinite optimization. Acta Numerica, 10, 515–560.Google Scholar

Tseng, P., and Yun, S. 2010. A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Computational Optimization and Applications, 47(2), 179–206.Google Scholar

Vandenberghe, L. 2016. Slides for EE236C: Optimization Methods for Large-Scale Systems.Google Scholar

Vandenberghe, L., and Boyd, S. 1996. Semidefinite programming. SIAM Review, 38, 49–95.CrossRef Google Scholar

Vapnik, V. 1992. Principles of risk minimization for learning theory. Pages 831–838 of: Advances in Neural Information Processing Systems.Google Scholar

Vapnik, V. 2013. The Nature of Statistical Learning Theory. Berlin: Springer Science & Business Media.Google Scholar

Wibisono, A., Wilson, A. C., and Jordan, M. I. 2016. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47), E7351–E7358.CrossRef Google Scholar PubMed

Wolfe, P. 1975. A method of conjugate subgradients for minimizing nondifferentiable functions. Pages 145–173 of: Nondifferentiable Optimization. Springer.CrossRef Google Scholar

Wright, S. J. 1997. Primal-Dual Interior-Point Methods. Philadelphia, PA: SIAM.Google Scholar

Wright, S. J. 2012. Accelerated block-coordinate relaxation for regularized optimization. SIAM Journal on Optimization, 22(1), 159–186.Google Scholar

Wright, S. J. 2018. Optimization algorithms for data analysis. Pages 49–97 of: Mahoney, M., Duchi, J. C., and Gilbert, A. (eds), The Mathematics of Data. IAS/Park City Mathematics Series, vol. 25. AMS.Google Scholar

Wright, S. J., and Lee, C.-P. 2020. Analyzing random permutations for cyclic coordinate descent. Mathematics of Computation, 89, 2217–2248.CrossRef Google Scholar

Wright, S. J., Nowak, R. D., and Figueiredo, M. A. T. 2009. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(August), 2479–2493.Google Scholar

Zhang, T. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. Page 116 of: Proceedings of the Twenty-First International Conference on Machine Learning.Google Scholar

Book contents

Bibliography

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive