Pruning at Initialisation through the lens of Graphon Limit: Convergence, Expressivity, and Generalisation

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work elucidates the theoretical mechanism by which initialization-based pruning methods induce global sparsity patterns in large-scale neural networks. By leveraging graph limit theory, the authors map discrete pruning heuristics to continuous graphon objects, thereby establishing the first graphon-based taxonomy of sparse network topologies that distinguishes between homogeneous and heterogeneous connectivity structures. This framework transforms combinatorial graph problems into continuous operator analysis. Integrating graphon neural tangent kernels, a factorized saliency model, and generalization bound derivations, the study demonstrates that the universal approximation capability of sparse networks hinges on the intrinsic dimensionality of the active coordinate subspace. Furthermore, it establishes an upper bound on generalization error explicitly governed by the underlying graphon.

Technology Category

Application Category

📝 Abstract

Pruning at Initialisation methods discover sparse, trainable subnetworks before training, but their theoretical mechanisms remain elusive. Existing analyses are often limited to finite-width statistics, lacking a rigorous characterisation of the global sparsity patterns that emerge as networks grow large. In this work, we connect discrete pruning heuristics to graph limit theory via graphons, establishing the graphon limit of PaI masks. We introduce a Factorised Saliency Model that encompasses popular pruning criteria and prove that, under regularity conditions, the discrete masks generated by these algorithms converge to deterministic bipartite graphons. This limit framework establishes a novel topological taxonomy for sparse networks: while unstructured methods (e.g., Random, Magnitude) converge to homogeneous graphons representing uniform connectivity, data-driven methods (e.g., SNIP, GraSP) converge to heterogeneous graphons that encode implicit feature selection. Leveraging this continuous characterisation, we derive two fundamental theoretical results: (i) a Universal Approximation Theorem for sparse networks that depends only on the intrinsic dimension of active coordinate subspaces; and (ii) a Graphon-NTK generalisation bound demonstrating how the limit graphon modulates the kernel geometry to align with informative features. Our results transform the study of sparse neural networks from combinatorial graph problems into a rigorous framework of continuous operators, offering a new mechanism for analysing expressivity and generalisation in sparse neural networks.

Problem

Research questions and friction points this paper is trying to address.

Pruning at Initialisation

Graphon Limit

Sparse Neural Networks

Expressivity

Generalisation

Innovation

Methods, ideas, or system contributions that make the work stand out.

graphon limit

pruning at initialisation

sparse neural networks