🤖 AI Summary
This work addresses the challenge in the strong lottery ticket hypothesis—identifying high-performing subnetworks without any training. We propose a gradient-free evolutionary framework based on genetic algorithms to directly discover accurate and highly sparse subnetworks from randomly initialized over-parameterized neural networks. Departing from conventional training-based pruning paradigms, our method eliminates both training and gradient computation, instead employing structural encoding, fitness-driven selection, and mutation operations to identify subnetworks in an unsupervised, label-free manner. Evaluated on multiple binary and multi-class benchmark tasks, the discovered subnetworks achieve comparable or superior accuracy to fully trained networks—despite sparsity exceeding 90%. These results empirically validate the feasibility of constructing efficient models without training, offering a novel pathway toward lightweight deep learning.
📝 Abstract
Building modern deep learning systems that are not just effective but also efficient requires rethinking established paradigms for model training and neural architecture design. Instead of adapting highly overparameterized networks and subsequently applying model compression techniques to reduce resource consumption, a new class of high-performing networks skips the need for expensive parameter updates, while requiring only a fraction of parameters, making them highly scalable. The Strong Lottery Ticket Hypothesis posits that within randomly initialized, sufficiently overparameterized neural networks, there exist subnetworks that can match the accuracy of the trained original model-without any training. This work explores the usage of genetic algorithms for identifying these strong lottery ticket subnetworks. We find that for instances of binary and multi-class classification tasks, our approach achieves better accuracies and sparsity levels than the current state-of-the-art without requiring any gradient information. In addition, we provide justification for the need for appropriate evaluation metrics when scaling to more complex network architectures and learning tasks.