Self-concordant Smoothing for Large-Scale Convex Composite Optimization

📅 2023-09-04

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This paper addresses large-scale convex composite optimization—minimizing the sum of a smooth objective and a nonsmooth structured regularizer (e.g., ℓ₁ or group Lasso). We propose a novel “self-concordant smoothing” framework that unifies differentiable approximations of nonsmooth terms and induces variable metrics and adaptive step sizes tailored for proximal Newton-type methods. Our key contribution is Prox-GGN-SCORE, a Hessian-inverse-free proximal generalized Gauss–Newton algorithm, which enjoys global convergence guarantees under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that Prox-GGN-SCORE significantly outperforms state-of-the-art solvers—including Prox-Newton and L-BFGS-B—in both efficiency and scalability. To facilitate reproducibility and large-scale deployment, we release an open-source Julia package supporting both full-batch and mini-batch optimization.

📝 Abstract

We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. The key highlight of our approach is in a natural property of the resulting problem's structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as $ell_1$-regularization and group-lasso penalties. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches. A Julia package implementing the proposed algorithms is available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.

Problem

Research questions and friction points this paper is trying to address.

Develop self-concordant smoothing for convex composite optimization

Design proximal quasi-Newton methods with variable metric and step length

Handle nonsmooth structures like l1-regularization and group lasso efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-concordant smoothing for convex composite optimization

Variable metric selection and step length for proximal quasi-Newton

Low-rank Hessian inverse approximation for overparameterized models

🔎 Similar Papers

Vertex Exchange Method for a Class of Quadratic Programming Problems