Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the core limitation of the Lottery Ticket Hypothesis (LTH): the poor generalizability of sparse masks across different random weight initializations. We identify misalignment of optimization basins as the fundamental cause. To resolve this, we propose a weight-symmetry-based mask permutation strategy—leveraging neuron-level weight distribution analysis and permutation-based alignment—to dynamically adapt pre-identified LTH masks to the optimization basin of a newly initialized model, thereby enabling transferable sparse training. Unlike conventional LTH approaches that yield initialization-specific masks, our method achieves cross-initialization mask reusability. Extensive experiments on CIFAR-10/100 and ImageNet with VGG-11, ResNet-20, and ResNet-50 demonstrate substantial improvements in sparse-training accuracy. This work establishes a new paradigm for efficient and robust sparse neural network training.

Technology Category

Application Category

📝 Abstract

The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).

Problem

Research questions and friction points this paper is trying to address.

Aligning LTH masks with new random initializations for sparse training

Improving generalization of sparse models via permuted mask alignment

Addressing computational cost and mask generalization in Lottery Ticket Hypothesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning LTH masks using weight symmetry

Permuting masks to match new optimization basins

Enhancing generalization in sparse training

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI