Token Perturbation Guidance for Diffusion Models

πŸ“… 2025-06-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Classifier-free guidance (CFG) improves generation quality and conditional alignment in diffusion models but requires specialized training and is restricted to conditional generation. This paper proposes Token Perturbation Guidance (TPG), a universal, training-free, architecture-agnostic, and condition-agnostic guidance method: it applies norm-preserving perturbation matrices to intermediate token representations, enabling sampling guidance via token reordering and dynamic feature-space modulation. TPG is the first method to unify unconditional and conditional generation without model retraining. On SDXL, TPG reduces unconditional-generation FID by nearly 50% while achieving prompt alignment comparable to CFG; it also generalizes seamlessly to Stable Diffusion 2.1. The implementation is open-sourced.

Technology Category

Application Category

πŸ“ Abstract
Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We further analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1 show that TPG achieves nearly a 2$ imes$ improvement in FID for unconditional generation over the SDXL baseline, while closely matching CFG in prompt alignment. These results establish TPG as a general, condition-agnostic guidance method that brings CFG-like benefits to a broader class of diffusion models. The code is available at https://github.com/TaatiTeam/Token-Perturbation-Guidance
Problem

Research questions and friction points this paper is trying to address.

Enhance diffusion model generation without training changes
Provide guidance for both conditional and unconditional generation
Improve generation quality and prompt alignment in diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies perturbation matrices to token representations
Uses norm-preserving shuffling for stable guidance
Training-free and works for any input conditions
πŸ”Ž Similar Papers
No similar papers found.
J
Javad Rajabi
University of Toronto, Vector Institute for Artificial Intelligence
S
Soroush Mehraban
University of Toronto, Vector Institute for Artificial Intelligence, KITE Research Institute
Seyedmorteza Sadat
Seyedmorteza Sadat
PhD student, ETH ZΓΌrich
diffusion modelsgenerative modelingcomputer visiondeep learning
Babak Taati
Babak Taati
KITE Research Institute |Toronto Rehab - UHN & Department of Computer Science, University of Toronto
Computer VisionHealth MonitoringAmbient Intelligence