Variance Reduction for Expectations with Diffusion Teachers

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the high computational cost incurred when using pretrained diffusion models as fixed teachers in downstream tasks, where gradient estimation relies on high-variance Monte Carlo expectations. To mitigate this, the authors propose the CARV framework, which introduces a computation-aware variance accounting mechanism. By reusing expensive upstream computations and integrating timestep importance sampling with a stratified inverse CDF construction, CARV yields a low-variance stratified Monte Carlo estimator. Empirical results demonstrate 2–3× effective computational gains in text-to-3D distillation and data attribution tasks, along with an order-of-magnitude variance reduction in single-step distillation. However, no significant improvement in FID is observed, suggesting that variance is no longer the primary performance bottleneck.
📝 Abstract
Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.
Problem

Research questions and friction points this paper is trying to address.

variance reduction
diffusion models
Monte Carlo estimation
compute cost
teacher gradients
Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction
diffusion teacher
Monte Carlo estimation
importance sampling
compute amortization
🔎 Similar Papers
2024-05-22Neural Information Processing SystemsCitations: 33