Enhance-A-Video: Better Generated Video for Free

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

DiT-based video generation models suffer from poor temporal coherence and suboptimal visual fidelity. To address this, we propose a training-free, plug-and-play post-processing enhancement method that explicitly strengthens inter-frame dynamic relationships by reweighting off-diagonal regions in temporal attention maps—thereby improving temporal consistency and sharpness without modifying model parameters or incurring additional training overhead. This is the first approach to achieve training-agnostic enhancement for DiT video generators, offering architecture-agnostic compatibility across diverse DiT variants and requiring only a single forward pass for inference-time enhancement. Extensive experiments demonstrate significant improvements in motion coherence and detail fidelity across multiple DiT video generation models, with zero computational or memory overhead during inference. Our method establishes a new paradigm for efficient, high-quality video generation by leveraging inherent attention structures for lightweight, parameter-free temporal refinement.

Technology Category

Application Category

📝 Abstract

DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.

Problem

Research questions and friction points this paper is trying to address.

Enhancing DiT-based video generation

Improving cross-frame coherence

Training-free quality enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free video enhancement

Cross-frame correlation improvement

Non-diagonal temporal attention

🔎 Similar Papers

No similar papers found.