Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

📅 2024-02-20
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental trade-off between accuracy and model size in the Strong Lottery Ticket (SLT) hypothesis. We propose a novel partial freezing mechanism for randomly initialized networks: without imposing sparsity constraints, a subset of weights is frozen to reduce SLT memory overhead. This is the first approach enabling efficient SLT search at arbitrary sparsity levels, overcoming the accuracy degradation bottleneck imposed by conventional high-sparsity constraints. Integrating permanent pruning with the Edge-Popup algorithm, we discover and validate SLTs on architectures including ResNet. On ImageNet, freezing only 70% of parameters yields an SLT that compresses the dense source network by 3.3× while achieving up to a 14.12-percentage-point accuracy gain over random pruning baselines—significantly improving the accuracy–model size Pareto frontier.

Technology Category

Application Category

📝 Abstract
Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70%$ of a ResNet on ImageNet provides $3.3 imes$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.
Problem

Research questions and friction points this paper is trying to address.

Compress memory size of SLTs
Maintain accuracy in sparse networks
Optimize accuracy-to-model size trade-off
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen weights for model compression
Enhanced accuracy with Edge-Popup
Larger compression without sparsity constraints
🔎 Similar Papers
No similar papers found.
H
Hikari Otsuka
Department of Information and Communications Engineering, Tokyo Institute of Technology, Japan
Daiki Chijiwa
Daiki Chijiwa
NTT
Ángel López García-Arias
Ángel López García-Arias
Department of Information and Communications Engineering, Tokyo Institute of Technology, Japan
Y
Yasuyuki Okoshi
Department of Information and Communications Engineering, Tokyo Institute of Technology, Japan
Kazushi Kawamura
Kazushi Kawamura
Waseda University
computer architectureIsing machine
Thiem Van Chu
Thiem Van Chu
Institute of Science Tokyo
Computer ArchitectureFPGAMachine Learning
Daichi Fujiki
Daichi Fujiki
Institute of Science Tokyo
Computer architecture
S
Susumu Takeuchi
NTT Corporation, Japan
M
Masato Motomura
Department of Information and Communications Engineering, Tokyo Institute of Technology, Japan