Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability in sparse optimization induced by ℓ_p regularization (0 < p < 1), which arises from unbounded gradients near zero. To overcome this challenge, the authors propose ReWA, a method that synergistically combines reparameterization, weight decay, and adaptive learning rates to construct an optimization trajectory closely aligned with ℓ_p regularization yet significantly more stable. The approach reveals an optimization landscape distinct from conventional ℓ_p-regularized objectives, achieving substantially enhanced model sparsity while mitigating training instability. Empirical evaluations on CIFAR-10 and ImageNet using ResNet architectures demonstrate that ReWA attains markedly higher sparsity than ℓ_1 regularization while preserving comparable test accuracy.
📝 Abstract
Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradients when $0<p<1$. In this paper, we introduce a novel approach to sparse optimization termed ReWA, based on Reparameterization, Weight decay, and Adaptive learning rate. ReWA is closely connected to $\ell_p$-regularization, yet it unveils a distinct optimization landscape that helps mitigate instability issues. Experiments on CIFAR-10 and ImageNet with ResNets demonstrate that ReWA leads to significant sparsity improvements over the $\ell_1$-regularization approach while preserving test accuracy.
Problem

Research questions and friction points this paper is trying to address.

sparse optimization
ℓ_p regularization
optimization instability
gradient unboundedness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterization
Weight decay
Adaptive learning rate
Sparse optimization
ℓ_p regularization
🔎 Similar Papers
No similar papers found.