š¤ AI Summary
Attention mechanisms in diffusion models lack systematic analysis regarding their functional roles, design principles, and cross-modal/task generalizability. Method: We propose the first unified taxonomy for attention modifications in diffusion models, categorizing improvements by architectural componentāe.g., U-Net backbone, cross-attention, spatial/channel-wise attentionāand integrating multi-dimensional analysis: architectural characterization, modality-aware comparison, performance attribution, and limitation diagnosis. Contribution/Results: Our analysis reveals distinct contribution pathways of attention to generation quality, training stability, sampling efficiency, and controllable editing. We identify critical bottlenecks: poor scalability, high computational redundancy, and weak theoretical interpretability. The framework provides a structured design guide for attention-augmented diffusion models and motivates future directionsāincluding attention sparsification, modular co-optimization, and interpretable attention modelingāto advance both efficacy and understanding.
š Abstract
Attention mechanisms have become a foundational component in diffusion models, significantly influencing their capacity across a wide range of generative and discriminative tasks. This paper presents a comprehensive survey of attention within diffusion models, systematically analysing its roles, design patterns, and operations across different modalities and tasks. We propose a unified taxonomy that categorises attention-related modifications into parts according to the structural components they affect, offering a clear lens through which to understand their functional diversity. In addition to reviewing architectural innovations, we examine how attention mechanisms contribute to performance improvements in diverse applications. We also identify current limitations and underexplored areas, and outline potential directions for future research. Our study provides valuable insights into the evolving landscape of diffusion models, with a particular focus on the integrative and ubiquitous role of attention.