Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the instability in sparse optimization induced by ℓ_p regularization (0 < p < 1), which arises from unbounded gradients near zero. To overcome this challenge, the authors propose ReWA, a method that synergistically combines reparameterization, weight decay, and adaptive learning rates to construct an optimization trajectory closely aligned with ℓ_p regularization yet significantly more stable. The approach reveals an optimization landscape distinct from conventional ℓ_p-regularized objectives, achieving substantially enhanced model sparsity while mitigating training instability. Empirical evaluations on CIFAR-10 and ImageNet using ResNet architectures demonstrate that ReWA attains markedly higher sparsity than ℓ_1 regularization while preserving comparable test accuracy.

📝 Abstract

Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradients when $0<p<1$. In this paper, we introduce a novel approach to sparse optimization termed ReWA, based on Reparameterization, Weight decay, and Adaptive learning rate. ReWA is closely connected to $\ell_p$-regularization, yet it unveils a distinct optimization landscape that helps mitigate instability issues. Experiments on CIFAR-10 and ImageNet with ResNets demonstrate that ReWA leads to significant sparsity improvements over the $\ell_1$-regularization approach while preserving test accuracy.

Problem

Research questions and friction points this paper is trying to address.

sparse optimization

ℓ_p regularization

optimization instability

gradient unboundedness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterization

Weight decay

Adaptive learning rate

Sparse optimization