HunyuanVideo: A Systematic Framework For Large Video Generative Models

📅 2024-12-03

📈 Citations: 1

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Open-source video generation models significantly lag behind proprietary counterparts, exacerbating the quality gap between industry and users. Method: We introduce the first ultra-large-scale open-source video foundation model (13B+ parameters), spanning the full stack—from dataset curation and architecture design to progressive training and efficient inference. Key innovations include a spatiotemporally decoupled diffusion architecture, multi-stage data cleaning and synthetic data augmentation, progressive scaling training, and a lightweight inference engine. Contribution/Results: Our model achieves state-of-the-art performance among open-source models across visual fidelity, motion coherence, text-video alignment accuracy, and camera motion modeling—surpassing Runway Gen-3, Luma 1.6, and three leading domestic SOTA models. Fully open-sourced code and weights foster fair, reproducible, and sustainable community advancement in video generation research.

Technology Category

Application Category

📝 Abstract

Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.

Problem

Research questions and friction points this paper is trying to address.

Video Quality Enhancement

Code Accessibility

Advanced Videography Techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source Video Generation

Large-scale Model Training

Realistic Video Production

🔎 Similar Papers

No similar papers found.