Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This paper addresses the sharp performance degradation of initialization-based pruning methods for “winning ticket” search under high sparsity. We propose Concrete Ticket Search (CTS), which formulates subnetwork discovery as an end-to-end combinatorial optimization problem. CTS employs Concrete relaxation to handle discrete architectural decisions, introduces GRADBALANCE—a gradient rebalancing mechanism—to stabilize training under extreme sparsity, and adopts a CTS-KL pruning objective based on minimizing reverse KL divergence. Crucially, CTS enables search-from-initialization without hyperparameter tuning. On CIFAR-10, ResNet-20 achieves 74.0% accuracy at 99.3% sparsity in just 7.9 minutes—substantially outperforming LTR and state-of-the-art initialization-based pruning methods. Our key contributions are: (i) the first integration of knowledge distillation into the winning ticket search objective, and (ii) gradient rebalancing to mitigate training instability in ultra-sparse regimes.

Technology Category

Application Category

📝 Abstract

The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networks. However, state-of-the-art methods of drawing these tickets, like Lottery Ticket Rewinding (LTR), are computationally prohibitive, while more efficient saliency-based Pruning-at-Initialization (PaI) techniques suffer from a significant accuracy-sparsity trade-off and fail basic sanity checks. In this work, we argue that PaI's reliance on first-order saliency metrics, which ignore inter-weight dependencies, contributes substantially to this performance gap, especially in the sparse regime. To address this, we introduce Concrete Ticket Search (CTS), an algorithm that frames subnetwork discovery as a holistic combinatorial optimization problem. By leveraging a Concrete relaxation of the discrete search space and a novel gradient balancing scheme (GRADBALANCE) to control sparsity, CTS efficiently identifies high-performing subnetworks near initialization without requiring sensitive hyperparameter tuning. Motivated by recent works on lottery ticket training dynamics, we further propose a knowledge distillation-inspired family of pruning objectives, finding that minimizing the reverse Kullback-Leibler divergence between sparse and dense network outputs (CTS-KL) is particularly effective. Experiments on varying image classification tasks show that CTS produces subnetworks that robustly pass sanity checks and achieve accuracy comparable to or exceeding LTR, while requiring only a small fraction of the computation. For example, on ResNet-20 on CIFAR10, it reaches 99.3% sparsity with 74.0% accuracy in 7.9 minutes, while LTR attains the same sparsity with 68.3% accuracy in 95.2 minutes. CTS's subnetworks outperform saliency-based methods across all sparsities, but its advantage over LTR is most pronounced in the highly sparse regime.

Problem

Research questions and friction points this paper is trying to address.

Identifies high-performing sparse subnetworks efficiently at initialization

Addresses accuracy-sparsity trade-off in pruning-at-initialization methods

Improves over computationally expensive lottery ticket rewinding techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concrete relaxation for combinatorial subnetwork optimization

Gradient balancing scheme to control sparsity automatically

Reverse KL divergence objective for pruning via distillation

🔎 Similar Papers

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets