Limitations of SGD for Multi-Index Models Beyond Statistical Queries

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a fundamental limitation of standard stochastic gradient descent (SGD) in learning target functions that depend on low-dimensional projections of the input, such as multi-index models and deep neural networks. Existing statistical query (SQ) frameworks fail to accurately capture this limitation due to unrealistic assumptions about SGD’s noise structure. The paper proposes a novel non-SQ theoretical framework that integrates multi-index modeling with gradient dynamics analysis, revealing intrinsic learning bottlenecks of standard SGD without relying on nonstandard assumptions such as trajectory constraints or vanishingly small learning rates. This approach not only provides analytical tools that better reflect the actual dynamics of SGD but also corrects potentially misleading predictions arising from the SQ framework, thereby clarifying the fundamental capabilities and limitations of SGD across a broad range of architectures.

Technology Category

Application Category

📝 Abstract
Understanding the limitations of gradient methods, and stochastic gradient descent (SGD) in particular, is a central challenge in learning theory. To that end, a commonly used tool is the Statistical Queries (SQ) framework, which studies performance limits of algorithms based on noisy interaction with the data. However, it is known that the formal connection between the SQ framework and SGD is tenuous: Existing results typically rely on adversarial or specially-structured gradient noise that does not reflect the noise in standard SGD, and (as we point out here) can sometimes lead to incorrect predictions. Moreover, many analyses of SGD for challenging problems rely on non-trivial algorithmic modifications, such as restricting the SGD trajectory to the sphere or using very small learning rates. To address these shortcomings, we develop a new, non-SQ framework to study the limitations of standard vanilla SGD, for single-index and multi-index models (namely, when the target function depends on a low-dimensional projection of the inputs). Our results apply to a broad class of settings and architectures, including (potentially deep) neural networks.
Problem

Research questions and friction points this paper is trying to address.

Stochastic Gradient Descent
Statistical Queries
Multi-Index Models
Learning Theory
Gradient Noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

non-SQ framework
vanilla SGD
multi-index models
gradient noise
learning theory
🔎 Similar Papers
No similar papers found.