🤖 AI Summary
This work addresses the challenge of achieving efficient, stable, and high-fidelity controllable generation with diffusion models without requiring retraining or backpropagation. By uncovering the intrinsic geometric structure underlying information preservation during the diffusion process, the authors construct a spectral basis derived from the singular functions of the conditional expectation operator. This enables projection of arbitrary guidance signals—such as class labels, CLIP embeddings, or spatial masks—onto the sampling trajectory, yielding training-free, precise control. The method further identifies, for the first time, a phase transition phenomenon within the diffusion process and locates an optimal guidance window, thereby unifying support for multimodal control. On CIFAR-10, it improves conditional accuracy by 37 percentage points over the strongest training-free baseline while accelerating sampling by a factor of four.
📝 Abstract
We introduce Spectral Guidance, a framework for controlling diffusion models by leveraging the intrinsic geometry of the generative process. As data is progressively corrupted by noise, only a small number of features remain informative for control. We characterize them as the singular functions of a conditional expectation operator and show that they can be learned via a self-supervised objective. Once recovered, this basis enables the projection of arbitrary guidance signals, such as labels, CLIP embeddings, or masks, directly onto the sampling trajectory. This approach allows for stable, high-fidelity control without retraining or denoiser backpropagation during sampling. Empirically, we improve conditional accuracy on CIFAR-10 by 37 percentage points over the strongest training-free baseline while offering $4\times$ faster sampling. Moreover, the same representations that support label and CLIP guidance also enable spatial control, such as mask-based guidance, without auxiliary models. Finally, our framework reveals a phase transition in the generative process, pinpointing the optimal time window for effective guidance.