Enhance-A-Video: Better Generated Video for Free

πŸ“… 2025-02-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
DiT-based video generation models suffer from poor temporal coherence and suboptimal visual fidelity. To address this, we propose a training-free, plug-and-play post-processing enhancement method that explicitly strengthens inter-frame dynamic relationships by reweighting off-diagonal regions in temporal attention mapsβ€”thereby improving temporal consistency and sharpness without modifying model parameters or incurring additional training overhead. This is the first approach to achieve training-agnostic enhancement for DiT video generators, offering architecture-agnostic compatibility across diverse DiT variants and requiring only a single forward pass for inference-time enhancement. Extensive experiments demonstrate significant improvements in motion coherence and detail fidelity across multiple DiT video generation models, with zero computational or memory overhead during inference. Our method establishes a new paradigm for efficient, high-quality video generation by leveraging inherent attention structures for lightweight, parameter-free temporal refinement.

Technology Category

Application Category

πŸ“ Abstract
DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.
Problem

Research questions and friction points this paper is trying to address.

Enhancing DiT-based video generation
Improving cross-frame coherence
Training-free quality enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free video enhancement
Cross-frame correlation improvement
Non-diagonal temporal attention
πŸ”Ž Similar Papers
No similar papers found.