🤖 AI Summary
This work addresses the challenge of achieving smooth, monotonic, and runtime-controllable disentanglement between content and style in diffusion models without fine-tuning the frozen backbone. The authors propose a bottleneck activation control interface that leverages prompt-based learning of low-dimensional latent codes, which are mapped to FiLM/AdaGN modulation parameters applied only during the late denoising stages of a frozen U-Net. Zero-initialization guarantees identity-preserving behavior at zero control scale, enabling the first single-scalar continuous adjustment of the content–style trade-off without retraining. Integrating timestep-aware gating, DDIM inversion stability diagnostics, and a compact latent architecture, the method demonstrates superior controllability and stability over LoRA on both Stable Diffusion 1.5 and SDXL, while ControlNet and rank-1 adapters fail to offer a comparable control interface.
📝 Abstract
We introduce SteeringDiffusion, a bottlenecked activation-level control interface for diffusion models that exposes a smooth, monotonic, and runtime-adjustable control surface over the content--style trade-off. Our method keeps the U-Net backbone frozen and learns a small, prompt-conditioned latent code projected to FiLM/AdaGN-style modulation parameters. A zero-initialized design guarantees exact equivalence to the base model at zero scale, while timestep-aware gating restricts modulation to later denoising stages. A single scalar at inference continuously traverses the control surface without retraining. Across experiments on Stable Diffusion~1.5 and SDXL covering multiple artistic styles, we show that SteeringDiffusion produces smooth and monotonic content--style trade-offs. Under matched parameter budgets, it outperforms LoRA in controllability and stability, while ControlNet and rank-1 adapters do not expose a comparable control surface. We further introduce an inversion-stability diagnostic based on DDIM inversion, used as a post-hoc trajectory probe, which reveals strong correlations with intervention magnitude. These results position \emph{Steering Bottlenecked Explicit Control (S-BEC)} as a practical, general-purpose control interface for frozen diffusion backbones.