A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

261K/year

🤖 AI Summary

This work investigates the mechanism underlying the spontaneous emergence of low-rank structure in neural network weights during training. Under significantly weaker assumptions—arbitrary depth and width, full-parameter training, smooth loss, infinitesimal regularization, and convergence only to a second-order stationary point—we establish the universality of this phenomenon. We introduce a key “derandomization lemma” that, combined with expectation-function analysis and perturbed gradient descent, rigorously proves that the first-layer weight matrix converges to low-rank structure throughout training. This structural bias substantially reduces the sample complexity required for generalization. Moreover, it enables end-to-end, provably correct neural solvers for combinatorial optimization problems—including MAX-CUT approximation and Johnson–Lindenstrauss embedding—by leveraging the emergent low-rank geometry. Crucially, our theory dispenses with strong initialization requirements, architecture-specific constraints, or stringent convergence assumptions (e.g., global optimality), thereby providing, for the first time, a rigorous foundation for structural discovery under broad, practically relevant settings.

Technology Category

Application Category

📝 Abstract

Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g. perturbed gradient descent (PGD). At the core of our approach is a key $ extit{derandomization}$ lemma, which states that optimizing the function $mathbb{E}_{mathbf{x}} left[g_θ(mathbf{W}mathbf{x} + mathbf{b}) ight]$ converges to a point where $mathbf{W} = mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.

Problem

Research questions and friction points this paper is trying to address.

Studying neural network structure discovery under weaker assumptions and general conditions

Explaining feature learning dynamics across arbitrary network sizes and depths

Developing derandomization framework with applications beyond neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Derandomization framework for structure discovery in neural networks

Applies to arbitrary size and depth networks with all parameters trainable

Uses second-order stationary point optimization under smooth loss functions

🔎 Similar Papers

No similar papers found.