The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work investigates the fundamental causes of divergent trainability among sparse neural networks sharing identical sparsity levels. Addressing the critical issue of how pruning-induced structural properties influence training dynamics, we propose a graphon-based representation framework grounded in graph limit theory—the first application of graphons to neural network pruning analysis—establishing a theoretical link between finite pruning structures and asymptotic training behavior in the infinite-width regime. By constructing a Graphon Neural Tangent Kernel (Graphon NTK) model and analyzing its spectral properties, we uncover implicit structural biases inherent to different pruning methods and demonstrate their decisive impact on convergence. Empirical validation confirms that graphon-predicted convergence closely matches actual training dynamics. Our approach yields the first unified, analytically tractable infinite-width modeling paradigm for sparse architectures, substantially advancing both theoretical understanding and controllable design of trainable sparse networks.

Technology Category

Application Category

📝 Abstract

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the Graphon Limit Hypothesis and provide empirical evidence to support it. Leveraging this graphon representation, we derive a Graphon Neural Tangent Kernel (Graphon NTK) to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.

Problem

Research questions and friction points this paper is trying to address.

Understanding why certain sparse neural network structures train better than others

Developing a theoretical framework using graphons to analyze sparse networks

Explaining how connectivity patterns affect trainability of pruned networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using graphons to model sparse neural networks

Deriving Graphon NTK for training dynamics analysis

Explaining pruning convergence via spectral analysis

🔎 Similar Papers

No similar papers found.