Reusing Samples in Variance Reduction

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

In structured stochastic optimization, there exists a fundamental trade-off between expensive full-batch queries (e.g., gradients or matrix-vector products) and cheaper sample-based queries (e.g., stochastic function evaluations, row accesses, or generative model calls). Method: We propose the “pseudo-independent algorithms” framework—a generalization of pseudo-determinism—that systematically reuses random samples across iterative subproblems to reduce their variance. Our approach unifies finite-sum minimization, principal eigenvector computation, and MDP policy optimization by sharing randomness across subproblems while preserving convergence guarantees. Contribution/Results: This work introduces the first generic variance-reduction mechanism applicable to diverse stochastic optimization algorithms. It achieves optimal or near-optimal query complexity for gradient evaluations, matrix-vector products, and generative model calls—significantly reducing total query counts and improving practical computational efficiency without sacrificing theoretical rigor.

Technology Category

Application Category

📝 Abstract

We provide a general framework to improve trade-offs between the number of full batch and sample queries used to solve structured optimization problems. Our results apply to a broad class of randomized optimization algorithms that iteratively solve sub-problems to high accuracy. We show that such algorithms can be modified to reuse the randomness used to query the input across sub-problems. Consequently, we improve the trade-off between the number of gradient (full batch) and individual function (sample) queries for finite sum minimization, the number of matrix-vector multiplies (full batch) and random row (sample) queries for top-eigenvector computation, and the number of matrix-vector multiplies with the transition matrix (full batch) and generative model (sample) queries for optimizing Markov Decision Processes. To facilitate our analysis we introduce the notion of pseudo-independent algorithms, a generalization of pseudo-deterministic algorithms [Gat and Goldwasser 2011], that quantifies how independent the output of a randomized algorithm is from a randomness source.

Problem

Research questions and friction points this paper is trying to address.

Improving trade-offs between full batch and sample queries

Reusing randomness across optimization sub-problems

Enhancing query efficiency for structured optimization problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reusing randomness across iterative sub-problems

Improving gradient versus sample query trade-offs

Introducing pseudo-independent algorithms for analysis

🔎 Similar Papers

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE