🤖 AI Summary
This work addresses the high inference cost and incompatibility with distillation or single-step generation inherent in Classifier-Free Guidance for diffusion models, as well as the lack of theoretical grounding in existing attention-based extrapolation methods. The authors establish a novel connection by modeling attention dynamics as fixed-point iterations in modern Hopfield networks, revealing that attention extrapolation is a special case of Anderson acceleration. Building on this insight, they propose Geometry-Aware Guidance (GAG), a plug-and-play acceleration mechanism grounded in directional decomposition and weak contractivity analysis. GAG ensures stable and efficient inference while seamlessly integrating into existing diffusion frameworks, significantly improving generation quality without sacrificing computational efficiency.
📝 Abstract
Classifier-Free Guidance (CFG) has significantly enhanced the generative quality of diffusion models by extrapolating between conditional and unconditional outputs. However, its high inference cost and limited applicability to distilled or single-step models have shifted research focus toward attention-space extrapolation. While these methods offer computational efficiency, their theoretical underpinnings remain elusive. In this work, we establish a foundational framework for attention-space extrapolation by modeling attention dynamics as fixed-point iterations within Modern Hopfield Networks. We demonstrate that the extrapolation effect in attention space constitutes a special case of Anderson Acceleration applied to these dynamics. Building on this insight and the weak contraction property, we propose Geometry Aware Attention Guidance (GAG). By decomposing attention updates into parallel and orthogonal components relative to the guidance direction, GAG stabilizes the acceleration process and maximizes guidance efficiency. Our plug-and-play method seamlessly integrates with existing frameworks while significantly improving generation quality.