Stochastic Gradients under Nuisances

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This paper studies the convergence of stochastic gradient optimization under unknown nuisance parameters—such as unobserved confounders—that render the objective nonstationary and violate classical assumptions like Neyman orthogonality. To address this, we propose an approximate orthogonalization mechanism for gradient updates, enabling non-asymptotic convergence guarantees without requiring strict orthogonality. Our theoretical analysis shows that standard stochastic gradient descent achieves an $O(1/sqrt{T})$ convergence rate whenever the influence of nuisance parameters can be effectively suppressed along the gradient direction. The method integrates a double machine learning framework with causal inference applications, demonstrating robustness under realistic conditions—including non-i.i.d. data and model misspecification. Our main contribution is extending the applicability of stochastic optimization to settings with nuisance parameters, providing the first non-asymptotic theoretical foundation with explicit convergence rates for efficient learning in such challenging scenarios.

Technology Category

Application Category

📝 Abstract

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.

Problem

Research questions and friction points this paper is trying to address.

Optimizing stochastic gradients with unknown nuisance parameters

Establishing non-asymptotic convergence guarantees under nuisance conditions

Developing orthogonalized update variants for improved convergence rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic gradient optimization with nuisance parameters

Convergence under Neyman orthogonality conditions

Orthogonalized gradient oracle variant algorithm

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation