Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit

📅 2024-07-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the conventional wisdom that “wider networks generalize better,” investigating the learning dynamics of Bayesian Parallel-Branch Graph Neural Networks (BPB-GNNs) in the narrow-width regime—where width is asymptotically smaller than the number of samples. Method: Integrating Bayesian deep learning, kernel methods, and symmetry-breaking theory, we establish the first analytical framework for parallel-branch networks in the narrow-width limit. Our analysis reveals kernel renormalization-induced symmetry breaking across branches, decoupling the readout norm from hyperparameters and rendering it dependent solely on intrinsic data structure. Results: Theoretically and empirically, we demonstrate that narrow-width BPB-GNNs achieve test accuracy comparable to—or exceeding—that of their wide-width counterparts under bias-constrained settings, while exhibiting enhanced robustness. Crucially, this narrow-width effect is an architectural hallmark of parallel-branch GNNs, validated across multiple graph learning benchmarks.

Technology Category

Application Category

📝 Abstract
The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. [2018]), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. [2019]). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN), an architecture that resembles residual networks. We demonstrate that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-GNN in the narrow width limit is generally superior or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.
Problem

Research questions and friction points this paper is trying to address.

Investigates narrow width limit in Bayesian Parallel Branching Neural Networks.
Challenges notion that larger widths always improve generalization.
Demonstrates superior performance in narrow width for bias-limited scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores narrow width limit in Bayesian Parallel Branching Neural Networks
Demonstrates robust learning via symmetry breaking in kernel renormalization
Shows narrow width outperforms wide width in bias-limited scenarios
🔎 Similar Papers
No similar papers found.
Zechen Zhang
Zechen Zhang
Harvard University
deep learning theoryalignmentmechanistic interpretabilitylanguage model agent
H
H. Sompolinsky
Center for Brain Science, Harvard University; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University; Edmond and Lily Safra Center for Brain Sciences, Hebrew University of Jerusalem