HunyuanVideo 1.5 Technical Report

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation models suffer from excessive parameter counts, high inference costs, and insufficient motion coherence. To address these challenges, this work introduces a lightweight open-source video generation model with 8.3B parameters—the first to enable high-quality, unified text-to-video and image-to-video generation across multiple durations and resolutions on consumer-grade GPUs. Methodologically, we propose Selective Sliding Tile Attention (SSTA), integrate glyph-aware text encoding, and adopt a progressive training strategy to enhance motion modeling and bilingual (Chinese–English) comprehension. Built upon an enhanced DiT architecture, the model incorporates rigorous data curation, an efficient video super-resolution network, and an end-to-end optimization pipeline. Experiments demonstrate state-of-the-art visual quality and motion coherence among open-source models. The code and pretrained weights are fully open-sourced, significantly lowering barriers for research and practical deployment in video generation.

Technology Category

Application Category

📝 Abstract
We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions.Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source video generation models. By releasing the code and model weights, we provide the community with a high-performance foundation that lowers the barrier to video creation and research, making advanced video generation accessible to a broader audience. All open-source assets are publicly available at https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.
Problem

Research questions and friction points this paper is trying to address.

Developing lightweight video generation model with high visual quality
Enabling efficient video generation on consumer-grade GPU hardware
Creating unified framework for text-to-video and image-to-video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective and sliding tile attention DiT architecture
Glyph-aware text encoding for bilingual understanding
Efficient video super-resolution network for quality
🔎 Similar Papers
No similar papers found.
Bing Wu
Bing Wu
Genentech
geneticscell biologyneurosciencesingle cell sequencing
Chang Zou
Chang Zou
Intern at EPIC Lab, Shanghai Jiao Tong University
Generative modelsImages and Videos generation
Changlin Li
Changlin Li
Tencent
Deep LearningComputer Vision
Duojun Huang
Duojun Huang
Sun Yat-sen University
Computer Vision
F
Fang Yang
Tencent Hunyuan Foundation Model Team
Hao Tan
Hao Tan
Adobe Research
Vision and Language3D Multimodal
J
Jack Peng
Tencent Hunyuan Foundation Model Team
J
Jianbing Wu
Tencent Hunyuan Foundation Model Team
Jiangfeng Xiong
Jiangfeng Xiong
Tencent
AIGC
J
Jie Jiang
Tencent Hunyuan Foundation Model Team
L
Linus
Tencent Hunyuan Foundation Model Team
P
Patrol
Tencent Hunyuan Foundation Model Team
P
Peizhen Zhang
Tencent Hunyuan Foundation Model Team
P
Peng Chen
Tencent Hunyuan Foundation Model Team
P
Penghao Zhao
Tencent Hunyuan Foundation Model Team
Q
Qi Tian
Tencent Hunyuan Foundation Model Team
S
Songtao Liu
Tencent Hunyuan Foundation Model Team
W
Weijie Kong
Tencent Hunyuan Foundation Model Team
Weiyan Wang
Weiyan Wang
Tencent
Machine Learning SystemHigh Performance Computing
X
Xiao He
Tencent Hunyuan Foundation Model Team
X
Xin Li
Tencent Hunyuan Foundation Model Team
X
Xinchi Deng
Tencent Hunyuan Foundation Model Team
X
Xuefei Zhe
Tencent Hunyuan Foundation Model Team
Y
Yang Li
Tencent Hunyuan Foundation Model Team
Yanxin Long
Yanxin Long
Tencent; Sun Yat-sen University
Computer VisionVision+language