🤖 AI Summary
This work addresses the instability in sparse optimization induced by ℓ_p regularization (0 < p < 1), which arises from unbounded gradients near zero. To overcome this challenge, the authors propose ReWA, a method that synergistically combines reparameterization, weight decay, and adaptive learning rates to construct an optimization trajectory closely aligned with ℓ_p regularization yet significantly more stable. The approach reveals an optimization landscape distinct from conventional ℓ_p-regularized objectives, achieving substantially enhanced model sparsity while mitigating training instability. Empirical evaluations on CIFAR-10 and ImageNet using ResNet architectures demonstrate that ReWA attains markedly higher sparsity than ℓ_1 regularization while preserving comparable test accuracy.
📝 Abstract
Sparse optimization is a fundamental challenge in various practical applications. A popular approach to sparse optimization is $\ell_p$ regularization. However, it may encounter optimization instability due to the unbounded gradients when $0<p<1$. In this paper, we introduce a novel approach to sparse optimization termed ReWA, based on Reparameterization, Weight decay, and Adaptive learning rate. ReWA is closely connected to $\ell_p$-regularization, yet it unveils a distinct optimization landscape that helps mitigate instability issues. Experiments on CIFAR-10 and ImageNet with ResNets demonstrate that ReWA leads to significant sparsity improvements over the $\ell_1$-regularization approach while preserving test accuracy.