Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Diffusion models suffer from quadratic computational complexity and slow inference due to self-attention. To address this, we propose a training-agnostic, dynamic token pruning method. Our approach introduces a cache-augmented adaptive token merging (ToMe) mechanism that dynamically adjusts the merging threshold based on inter-token similarity across denoising steps and reuses historically similar token pairs to enhance stability and efficiency. Crucially, the method requires no fine-tuning or retraining and enables real-time removal of redundant tokens during the denoising process. Experiments demonstrate a 1.24× speedup in end-to-end inference while preserving FID performance, alongside significant reductions in GPU memory consumption and computational overhead. This work establishes a lightweight, general-purpose, and plug-and-play paradigm for efficient diffusion model inference.

Technology Category

Application Category

📝 Abstract

Diffusion models have emerged as a promising approach for generating high-quality, high-dimensional images. Nevertheless, these models are hindered by their high computational cost and slow inference, partly due to the quadratic computational complexity of the self-attention mechanisms with respect to input size. Various approaches have been proposed to address this drawback. One such approach focuses on reducing the number of tokens fed into the self-attention, known as token merging (ToMe). In our method, which is called cached adaptive token merging(CA-ToMe), we calculate the similarity between tokens and then merge the r proportion of the most similar tokens. However, due to the repetitive patterns observed in adjacent steps and the variation in the frequency of similarities, we aim to enhance this approach by implementing an adaptive threshold for merging tokens and adding a caching mechanism that stores similar pairs across several adjacent steps. Empirical results demonstrate that our method operates as a training-free acceleration method, achieving a speedup factor of 1.24 in the denoising process while maintaining the same FID scores compared to existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Models

Efficiency

Attention Mechanism

Innovation

Methods, ideas, or system contributions that make the work stand out.

CA-ToMe

Efficiency Improvement

Adaptive Token Merging

🔎 Similar Papers

Efficient Time Series Processing for Transformers and State-Space Models through Token Merging