NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unclear mechanisms by which feedforward networks (FFNs) in large language models organize information in high-dimensional latent spaces, particularly due to the lack of effective tools for analyzing their dynamic properties. The authors propose NerVE, a lightweight and memory-efficient framework that introduces, for the first time, a unified spectral analysis approach for FFN activations. By integrating four complementary metrics—spectral entropy, participation ratio, early eigenvalue enrichment, and Jensen–Shannon divergence—the method systematically reveals how nonlinear activations and optimizer geometry jointly regulate the utilization of latent dimensions. NerVE consistently recovers stable spectral signatures across diverse architectures and scales, including Transformers and MLP-Mixers, and accurately captures their correlation with model generalization, offering actionable insights for architecture and optimizer design.

Technology Category

Application Category

📝 Abstract
We introduce NerVE, a unified eigenspectral framework for understanding how feed-forward networks (FFNs) in large language models (LLMs) organize and regulate information flow in high-dimensional latent space. Despite FFNs dominating the parameter budget, their high-dimensional dynamics remain poorly understood. NerVE addresses this gap through lightweight, memory-efficient tracking of eigenspectrum dynamics via four complementary metrics: Spectral Entropy (dispersion), Participation Ratio (effective dimensionality), Eigenvalue Early Enrichment (top-heaviness), and Jensen-Shannon divergence (distributional shifts). Our key insight is that FFN nonlinearities reinject variance across eigenmodes, fundamentally governing latent dimension utilization, and that optimizer geometry strongly modulates the extent of this variance reinjection. We validate NerVE across model scales, and diverse architectural and optimizer configurations, each uniquely shaping FFN dynamics: normalization schemes controlling variance flow; FFN weight geometries constraining latent space; positional encoding and activation functions regulating information flow; and optimizer choices redistributing effective capacity across depth. Across these settings, NerVE consistently recovers stable spectral signatures that correlate with model's generalization ability and respond predictably to design choices, generalizing beyond transformer to MLP-Mixer architectures, providing actionable insights for architectural and optimizer choices beyond trial-and-error.
Problem

Research questions and friction points this paper is trying to address.

feed-forward networks
eigenspectrum dynamics
large language models
latent space
information flow
Innovation

Methods, ideas, or system contributions that make the work stand out.

eigenspectrum dynamics
feed-forward networks
spectral entropy
variance reinjection
optimizer geometry
🔎 Similar Papers
No similar papers found.