Variance Reduction for Expectations with Diffusion Teachers

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the high computational cost incurred when using pretrained diffusion models as fixed teachers in downstream tasks, where gradient estimation relies on high-variance Monte Carlo expectations. To mitigate this, the authors propose the CARV framework, which introduces a computation-aware variance accounting mechanism. By reusing expensive upstream computations and integrating timestep importance sampling with a stratified inverse CDF construction, CARV yields a low-variance stratified Monte Carlo estimator. Empirical results demonstrate 2–3× effective computational gains in text-to-3D distillation and data attribution tasks, along with an order-of-magnitude variance reduction in single-step distillation. However, no significant improvement in FID is observed, suggesting that variance is no longer the primary performance bottleneck.

📝 Abstract

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.

Problem

Research questions and friction points this paper is trying to address.

variance reduction

diffusion models

Monte Carlo estimation

compute cost

teacher gradients

Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction

diffusion teacher

Monte Carlo estimation