Mask in the Mirror: Implicit Sparsification

📅 2024-08-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poorly understood mechanism underlying continuous sparsification for efficient large-model inference compression. From a learning dynamics perspective, we theoretically characterize its implicit regularization evolution: an initial L₂-biased phase transitions gradually to an L₁-sparse preference in later stages. Based on this insight, we propose PILoT—a parameter-controllable sparse training paradigm—featuring a novel time-varying Bregman potential control mechanism that actively steers the implicit bias trajectory. Within the mirror flow framework, we establish rigorous convergence and optimality guarantees. Our theoretical analysis is the first to explain why implicit L₁ regularization outperforms explicit L₁ regularization. Extensive experiments demonstrate that PILoT consistently surpasses baselines across standard benchmarks, achieving superior accuracy–sparsity trade-offs and validating the efficacy of theory-driven design.

Technology Category

Application Category

📝 Abstract
Continuous sparsification strategies are among the most effective methods for reducing the inference costs and memory demands of large-scale neural networks. A key factor in their success is the implicit $L_1$ regularization induced by jointly learning both mask and weight variables, which has been shown experimentally to outperform explicit $L_1$ regularization. We provide a theoretical explanation for this observation by analyzing the learning dynamics, revealing that early continuous sparsification is governed by an implicit $L_2$ regularization that gradually transitions to an $L_1$ penalty over time. Leveraging this insight, we propose a method to dynamically control the strength of this implicit bias. Through an extension of the mirror flow framework, we establish convergence and optimality guarantees in the context of underdetermined linear regression. Our theoretical findings may be of independent interest, as we demonstrate how to enter the rich regime and show that the implicit bias can be controlled via a time-dependent Bregman potential. To validate these insights, we introduce PILoT, a continuous sparsification approach with novel initialization and dynamic regularization, which consistently outperforms baselines in standard experiments.
Problem

Research questions and friction points this paper is trying to address.

Reducing inference costs
Memory demands reduction
Dynamic implicit bias control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit sparsification via L1 regularization
Dynamic control of implicit bias
PILoT with novel initialization
🔎 Similar Papers
No similar papers found.