🤖 AI Summary
This work proposes RTL, an adaptive pruning framework that addresses the limitation of traditional pruning methods, which assume a single subnetwork is universally optimal for all inputs and thereby overlook intrinsic data heterogeneity. RTL uniquely aligns subnetwork architecture with data heterogeneity by employing a dynamic routing mechanism to train specialized subnetworks—termed “adaptive lottery tickets”—for distinct classes, semantic clusters, or environmental conditions. The method introduces an unsupervised subnetwork similarity metric to effectively diagnose and mitigate over-sparsification. Experimental results demonstrate that RTL achieves significant performance gains over both single-model and multi-model baselines across multiple datasets, even when compressing the model to one-tenth of its original parameter count. Moreover, the pruned subnetworks exhibit improved balanced accuracy and recall, along with strong semantic alignment properties.
📝 Abstract
In pruning, the Lottery Ticket Hypothesis posits that large networks contain sparse subnetworks, or winning tickets, that can be trained in isolation to match the performance of their dense counterparts. However, most existing approaches assume a single universal winning ticket shared across all inputs, ignoring the inherent heterogeneity of real-world data. In this work, we propose Routing the Lottery (RTL), an adaptive pruning framework that discovers multiple specialized subnetworks, called adaptive tickets, each tailored to a class, semantic cluster, or environmental condition. Across diverse datasets and tasks, RTL consistently outperforms single- and multi-model baselines in balanced accuracy and recall, while using up to 10 times fewer parameters than independent models and exhibiting semantically aligned. Furthermore, we identify subnetwork collapse, a performance drop under aggressive pruning, and introduce a subnetwork similarity score that enables label-free diagnosis of oversparsification. Overall, our results recast pruning as a mechanism for aligning model structure with data heterogeneity, paving the way toward more modular and context-aware deep learning.