🤖 AI Summary
This work addresses the high computational cost incurred when using pretrained diffusion models as fixed teachers in downstream tasks, where gradient estimation relies on high-variance Monte Carlo expectations. To mitigate this, the authors propose the CARV framework, which introduces a computation-aware variance accounting mechanism. By reusing expensive upstream computations and integrating timestep importance sampling with a stratified inverse CDF construction, CARV yields a low-variance stratified Monte Carlo estimator. Empirical results demonstrate 2–3× effective computational gains in text-to-3D distillation and data attribution tasks, along with an order-of-magnitude variance reduction in single-step distillation. However, no significant improvement in FID is observed, suggesting that variance is no longer the primary performance bottleneck.
📝 Abstract
Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.