Breaking Neural Network Scaling Laws with Modularity

📅 2024-09-09
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Non-modular neural networks suffer from exponential sample complexity growth with input dimensionality in high-dimensional combinatorial tasks—a fundamental bottleneck for generalization. Method: We theoretically and empirically investigate modular neural networks’ generalization mechanisms. We first establish, for the first time, a rigorous theoretical guarantee that modular architectures achieve dimension-independent sample complexity. Methodologically, we propose a task-intrinsic dimensionality–driven modular architecture design and a theory-guided learning rule. Results: Experiments demonstrate significant improvements over baselines in both in-distribution and out-of-distribution generalization, achieving dimension-agnostic efficient learning on high-dimensional combinatorial tasks—thereby breaking conventional scaling laws. Our core contributions are threefold: (i) establishing the first formal theoretical foundation for modular generalization; (ii) devising a provably optimal, theory-grounded learning mechanism; and (iii) empirically validating its fundamental superiority for high-dimensional combinatorial generalization.

Technology Category

Application Category

📝 Abstract
Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional and combinatorial structure of real-world problems. However, a theoretical explanation of how modularity improves generalizability, and how to leverage task modularity while training networks remains elusive. Using recent theoretical progress in explaining neural network generalization, we investigate how the amount of training data required to generalize on a task varies with the intrinsic dimensionality of a task's input. We show theoretically that when applied to modularly structured tasks, while nonmodular networks require an exponential number of samples with task dimensionality, modular networks' sample complexity is independent of task dimensionality: modular networks can generalize in high dimensions. We then develop a novel learning rule for modular networks to exploit this advantage and empirically show the improved generalization of the rule, both in- and out-of-distribution, on high-dimensional, modular tasks.
Problem

Research questions and friction points this paper is trying to address.

Explains how modularity improves neural network generalizability.
Develops a learning rule for modular networks to enhance generalization.
Demonstrates modular networks' efficiency in high-dimensional tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular networks reduce sample complexity significantly.
Novel learning rule enhances modular network generalization.
Modularity enables generalization in high-dimensional tasks.
🔎 Similar Papers
No similar papers found.