Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high memory and computational costs of over-parameterized neural networks in resource-constrained settings by proposing the first fully differentiable method for discovering strong lottery ticket subnetworks. By introducing a continuously relaxed Bernoulli gating mechanism, the approach optimizes only learnable gating parameters in an end-to-end manner while keeping all network weights frozen, thereby minimizing an ℓ₀-regularized objective without relying on non-differentiable score selection or straight-through estimator approximations. The method achieves up to 90% sparsity with minimal accuracy loss across diverse architectures—including fully connected networks, CNNs (ResNet, Wide-ResNet), and vision transformers (ViT, Swin-T)—delivering approximately twice the sparsity performance of Edge-Popup while significantly improving optimization efficiency and scalability.

Technology Category

Application Category

📝 Abstract
Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods, notably edge-popup, rely on non-differentiable score-based selection, limiting optimization efficiency and scalability. We propose using continuously relaxed Bernoulli gates to discover SLTs through fully differentiable, end-to-end optimization - training only gating parameters while keeping all network weights frozen at their initialized values. Continuous relaxation enables direct gradient-based optimization of an $\ell_0$-regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first fully differentiable approach for SLT discovery that avoids straight-through estimator approximations. Experiments across fully connected networks, CNNs (ResNet, Wide-ResNet), and Vision Transformers (ViT, Swin-T) demonstrate up to 90% sparsity with minimal accuracy loss - nearly double the sparsity achieved by edge-popup at comparable accuracy - establishing a scalable framework for pre-training network sparsification.
Problem

Research questions and friction points this paper is trying to address.

over-parameterized neural networks
Strong Lottery Ticket
sparsity
resource-constrained deployment
non-differentiable selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Strong Lottery Ticket
Continuously Relaxed Bernoulli Gates
Differentiable Sparsification
ℓ₀-Regularization
Frozen Weights
🔎 Similar Papers
No similar papers found.