Information Hidden in Gradients of Regression with Target Noise

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenge of recovering second-order information—such as the Hessian or data covariance—from first-order gradients in optimization settings where only gradients are observable. The authors propose a simple yet effective strategy: injecting Gaussian noise into the parameters and calibrating the noise variance to the batch size. Under non-asymptotic conditions and far from optimality, this approach yields a robust approximation of the Hessian via the gradient covariance and accurately recovers the data covariance for the first time. The theoretical analysis relies on sub-Gaussian input assumptions and non-asymptotic operator norm bounds. Empirical validation on both synthetic and real-world datasets confirms the method’s efficacy. The framework is readily applicable to tasks such as preconditioned optimization and adversarial risk estimation.

Technology Category

Application Category

📝 Abstract

Second-order information -- such as curvature or data covariance -- is critical for optimisation, diagnostics, and robustness. However, in many modern settings, only the gradients are observable. We show that the gradients alone can reveal the Hessian, equalling the data covariance $\Sigma$ for the linear regression. Our key insight is a simple variance calibration: injecting Gaussian noise so that the total target noise variance equals the batch size ensures that the empirical gradient covariance closely approximates the Hessian, even when evaluated far from the optimum. We provide non-asymptotic operator-norm guarantees under sub-Gaussian inputs. We also show that without such calibration, recovery can fail by an $\Omega(1)$ factor. The proposed method is practical (a"set target-noise variance to $n$"rule) and robust (variance $\mathcal{O}(n)$ suffices to recover $\Sigma$ up to scale). Applications include preconditioning for faster optimisation, adversarial risk estimation, and gradient-only training, for example, in distributed systems. We support our theoretical results with experiments on synthetic and real data.

Problem

Research questions and friction points this paper is trying to address.

gradient

Hessian

covariance

second-order information

target noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient covariance

Hessian recovery

target noise calibration