Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

In text-to-image (T2I) diffusion models, the text encoder consumes up to 8× more memory than the denoising module—constituting a critical inference deployment bottleneck—despite its negligible computational cost. To address this, we propose Skrr, the first layer-wise sparsification method for text encoders in T2I generation. Skrr dynamically skips and reuses redundant Transformer blocks during a single forward pass, enabling fine-grained, task-aware layer pruning. Its lightweight, architecture-agnostic scheduler is designed via structural analysis and requires no retraining, enabling plug-and-play integration with mainstream models such as SDXL. Extensive evaluation shows Skrr preserves original performance across multiple metrics—including FID, CLIP-Score, DreamSim, and GenEval—while achieving up to 8× memory compression. This significantly outperforms existing pruning approaches, establishing a new state-of-the-art in efficient T2I inference.

Technology Category

Application Category

📝 Abstract

Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance in generating high-quality images from textual prompts. Unlike denoising modules that rely on multiple iterative steps, text encoders require only a single forward pass to produce text embeddings. However, despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage, up to eight times more than denoising modules. To address this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models. Skrr exploits the inherent redundancy in transformer blocks by selectively skipping or reusing certain layers in a manner tailored for T2I tasks, thereby reducing memory consumption without compromising performance. Extensive experiments demonstrate that Skrr maintains image quality comparable to the original model even under high sparsity levels, outperforming existing blockwise pruning methods. Furthermore, Skrr achieves state-of-the-art memory efficiency while preserving performance across multiple evaluation metrics, including the FID, CLIP, DreamSim, and GenEval scores.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory usage in text encoders

Maintains image quality in T2I models

Improves efficiency in transformer blocks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Skip and Re-use layers

Memory efficient text-to-image

Transformer blocks redundancy exploitation

🔎 Similar Papers

No similar papers found.