🤖 AI Summary
In stochastic optimization, importance sampling (IS) distributions depend cyclically on decision variables, creating a coupled optimization structure that severely hinders convergence analysis and sampling efficiency. To address this, we propose the first joint gradient-based update algorithm that simultaneously optimizes both decision variables and a parameterized IS distribution—without requiring multi-timescale separation. Our method builds upon a Nesterov dual-averaging variant to ensure synchronized convergence of the IS distribution and optimization variables. Under convex objectives and linear constraints, we establish rigorous global convergence and achieve the theoretically optimal asymptotic variance bound. Experiments demonstrate substantial improvements in sample efficiency and numerical stability, particularly in rare-event simulation tasks.
📝 Abstract
Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.