Accelerating Controllable Generation via Hybrid-grained Cache

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controlled generation models suffer from low inference efficiency due to the joint processing of control conditions and content synthesis. To address this, we propose a hybrid-granularity caching mechanism that jointly deploys block-level coarse-grained caching (reusing intermediate features) and prompt-level fine-grained caching (reusing cross-attention maps) within encoder-decoder architectures, enabling stride-skipping computation and feature reuse. Our method requires no architectural modifications or retraining, and is compatible with diverse control modalities. Evaluated on four benchmarks including COCO-Stuff, it reduces MACs by 63% (from 18.22T to 6.70T) while incurring ≤1.5% degradation in semantic fidelity—significantly improving the efficiency-quality trade-off. The core contribution lies in being the first to introduce multi-granularity caching into controlled generation inference, enabling efficient, near-lossless real-time generation.

Technology Category

Application Category

📝 Abstract
Controllable generative models have been widely used to improve the realism of synthetic visual content. However, such models must handle control conditions and content generation computational requirements, resulting in generally low generation efficiency. To address this issue, we propose a Hybrid-Grained Cache (HGC) approach that reduces computational overhead by adopting cache strategies with different granularities at different computational stages. Specifically, (1) we use a coarse-grained cache (block-level) based on feature reuse to dynamically bypass redundant computations in encoder-decoder blocks between each step of model reasoning. (2) We design a fine-grained cache (prompt-level) that acts within a module, where the fine-grained cache reuses cross-attention maps within consecutive reasoning steps and extends them to the corresponding module computations of adjacent steps. These caches of different granularities can be seamlessly integrated into each computational link of the controllable generation process. We verify the effectiveness of HGC on four benchmark datasets, especially its advantages in balancing generation efficiency and visual quality. For example, on the COCO-Stuff segmentation benchmark, our HGC significantly reduces the computational cost (MACs) by 63% (from 18.22T to 6.70T), while keeping the loss of semantic fidelity (quantized performance degradation) within 1.5%.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in controllable generative models
Improving generation efficiency while maintaining visual quality
Dynamically bypassing redundant computations with hybrid cache strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-grained cache reduces computational overhead
Coarse-grained cache bypasses redundant encoder-decoder computations
Fine-grained cache reuses cross-attention maps across steps
🔎 Similar Papers
No similar papers found.
L
Lin Liu
University of Science and Technology of China
H
Huixia Ben
Anhui University of Science and Technology
S
Shuo Wang
University of Science and Technology of China
Jinda Lu
Jinda Lu
University of Science and Technology of China
J
Junxiang Qiu
University of Science and Technology of China
S
Shengeng Tang
Hefei University of Technology
Yanbin Hao
Yanbin Hao
Hefei University of Technology
Video retrievalvideo action recognitionhashingVideo Hyperlinking