LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the GPU memory explosion caused by caching acceleration in diffusion-based video generation, this paper proposes a stage-wise memory optimization strategy: asynchronous cache swapping during encoding, feature chunking during denoising, and latent-space slicing during decoding, coordinated by a unified cache management mechanism. The method requires no model fine-tuning or retraining. It achieves up to 62% reduction in peak GPU memory consumption over baseline approaches, while preserving inference latency and maintaining controlled quality degradation (FVD increase <5%). Its core innovation lies in the first-stage–specific design of memory optimization techniques—tailored precisely to each phase of the inference pipeline—thereby achieving a positive trade-off between computational overhead and acceleration benefits. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Training-free acceleration has emerged as an advanced research area in video generation based on diffusion models. The redundancy of latents in diffusion model inference provides a natural entry point for acceleration. In this paper, we decompose the inference process into the encoding, denoising, and decoding stages, and observe that cache-based acceleration methods often lead to substantial memory surges in the latter two stages. To address this problem, we analyze the characteristics of inference across different stages and propose stage-specific strategies for reducing memory consumption: 1) Asynchronous Cache Swapping. 2) Feature chunk. 3) Slicing latents to decode. At the same time, we ensure that the time overhead introduced by these three strategies remains lower than the acceleration gains themselves. Compared with the baseline, our approach achieves faster inference speed and lower memory usage, while maintaining quality degradation within an acceptable range. The Code is available at https://github.com/NKUShaw/LightCache .

Problem

Research questions and friction points this paper is trying to address.

Reduces memory surges in video diffusion model inference

Accelerates denoising and decoding stages without training

Maintains video quality while lowering computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous Cache Swapping reduces memory consumption

Feature chunk strategy optimizes denoising stage efficiency

Slicing latents technique accelerates decoding process

🔎 Similar Papers

Accelerating Diffusion Transformers with Token-wise Feature Caching