🤖 AI Summary
Diffusion-based image generation often suffers from semantic inconsistency and hallucination. Existing inference-time guidance methods typically rely on external signals or model modifications, incurring additional computational or architectural overhead. This paper proposes a plug-and-play, computationally efficient sampling guidance method: it dynamically constructs a projection basis from intermediate samples and explicitly isolates and amplifies the tangential component of the score function via first-order Taylor expansion—thereby directly correcting the sampling trajectory. The approach requires no architectural changes to the diffusion model, introduces no external supervision, and adds negligible computational cost. Extensive evaluation across multiple diffusion frameworks—including DDPM, DDIM, and PNDM—demonstrates significant improvements in semantic consistency and fine-grained fidelity of generated images. The method exhibits strong generalizability and practicality, achieving consistent gains without sacrificing sampling efficiency or model integrity.
📝 Abstract
Recent diffusion models achieve the state-of-the-art performance in image generation, but often suffer from semantic inconsistencies or hallucinations. While various inference-time guidance methods can enhance generation, they often operate indirectly by relying on external signals or architectural modifications, which introduces additional computational overhead. In this paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and direct guidance method that operates solely on trajectory signals without modifying the underlying diffusion model. TAG leverages an intermediate sample as a projection basis and amplifies the tangential components of the estimated scores with respect to this basis to correct the sampling trajectory. We formalize this guidance process by leveraging a first-order Taylor expansion, which demonstrates that amplifying the tangential component steers the state toward higher-probability regions, thereby reducing inconsistencies and enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition, offering a new perspective on diffusion guidance.