This chapter reviews a collection of sparse optimization models and algorithms. The review focuses on introducing these algorithms along with their motivations and basic properties. The mathematical tools from the previous section will be heavily replied upon.
This chapter is organized as follows. Section 4.1 reviews a list of (convex) sparse optimization models, which deal with different types of signals and different kinds of noise; they can also include additional features as objectives and constraints that arise in practical problems. Section 4.2 demonstrates that the convex sparse optimization problems can be transformed in equivalent cone programs and solved by off-the-shelf algorithms; yet it argues that these algorithms are usually inefficient for large-scale problems. Sections 4.3–4.12 cover a large variety of algorithms for sparse optimization. These algorithms have different strengths and fit different applications. The efficiency of these algorithms, for the most part, comes from the use of the often closed-form shrinkage-like operations, which is introduced in Section 4.4. Then, Section 4.5 presents a prox-linear framework and gives several algorithms under this framework that are based on gradient descent and take advantages of shrinkage-like operations. These algorithms can often be linked with the BSUM framework discussed in Chapter 3. Duality is a very powerful tool in modern convex optimization; sparse optimization is not an exception. Section 4.6 derives a few dual models and algorithms for sparse optimization and discusses their properties. One class of dual algorithms (i.e., the ADMM algorithm introduced in Chapter 3) is very efficient and extremely versatile, applicable to nearly all convex sparse optimization problems. The framework and applications of these algorithms are given in Section 4.8. Unlike other algorithms, the homotopy algorithms in Section 4.9 produces not just one solution but a path of solutions for the LASSO model (which is given in (4.2b) below); not only are they efficient at producing multiple solutions corresponding to different parameter values, they are especially fast if the solution is extremely sparse. All the above algorithms can be (often significantly) accelerated by appropriately setting their stepsize parameters that determine the amount of progress at each iteration. Such techniques are reviewed in Section 4.10. Unlike all previous algorithms, greedy algorithms do not necessarily correspond to an optimization model; instead of systematically searching for solutions that minimize objective functions, they build sparse solutions by constructing their supports step by step in a greedy manner.