🤖 AI Summary
This paper studies the convergence of stochastic gradient optimization under unknown nuisance parameters—such as unobserved confounders—that render the objective nonstationary and violate classical assumptions like Neyman orthogonality. To address this, we propose an approximate orthogonalization mechanism for gradient updates, enabling non-asymptotic convergence guarantees without requiring strict orthogonality. Our theoretical analysis shows that standard stochastic gradient descent achieves an $O(1/sqrt{T})$ convergence rate whenever the influence of nuisance parameters can be effectively suppressed along the gradient direction. The method integrates a double machine learning framework with causal inference applications, demonstrating robustness under realistic conditions—including non-i.i.d. data and model misspecification. Our main contribution is extending the applicability of stochastic optimization to settings with nuisance parameters, providing the first non-asymptotic theoretical foundation with explicit convergence rates for efficient learning in such challenging scenarios.
📝 Abstract
Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.