Stochastic Optimization with Optimal Importance Sampling

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

In stochastic optimization, importance sampling (IS) distributions depend cyclically on decision variables, creating a coupled optimization structure that severely hinders convergence analysis and sampling efficiency. To address this, we propose the first joint gradient-based update algorithm that simultaneously optimizes both decision variables and a parameterized IS distribution—without requiring multi-timescale separation. Our method builds upon a Nesterov dual-averaging variant to ensure synchronized convergence of the IS distribution and optimization variables. Under convex objectives and linear constraints, we establish rigorous global convergence and achieve the theoretically optimal asymptotic variance bound. Experiments demonstrate substantial improvements in sample efficiency and numerical stability, particularly in rare-event simulation tasks.

Technology Category

Application Category

📝 Abstract

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.

Problem

Research questions and friction points this paper is trying to address.

Optimizing stochastic systems with dependent decision and sampling distributions

Reducing variance in Monte Carlo methods via importance sampling

Ensuring convergence and efficiency in iterative gradient-based algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-based algorithm updates decision and IS jointly

Achieves lowest asymptotic variance with global convergence

Preserves properties under linear constraints via Nesterov variant

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation