The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenge of efficiently learning Boolean $k$-juntas under standard i.i.d. uniform sampling, where conventional gradient-based methods struggle. The authors propose a novel approach that leverages temporally correlated samples generated by a lazy random walk on the hypercube, training a two-layer ReLU network via stochastic gradient descent (SGD) with a temporal difference loss. This method demonstrates, for the first time, that SGD can effectively exploit temporal correlations to achieve efficient learning of $k$-juntas, overcoming the limitations of traditional large-batch gradient methods based on pointwise convex losses. For any fixed $k$, the sample complexity scales nearly linearly with the ambient dimension $d$, substantially improving upon existing approaches.
📝 Abstract
We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.
Problem

Research questions and friction points this paper is trying to address.

temporal correlations
k-juntas
gradient-based learning
sparse learning
random walks
Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal correlations
k-juntas
stochastic gradient descent
temporal-difference loss
sample complexity