Wan: Open and Advanced Large-Scale Video Generative Models

📅 2025-03-26
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the openness, performance, and practicality bottlenecks of large-scale video generation models. Methodologically, we introduce Wan—a family of open-source video foundation models (1.3B/14B parameters)—supporting eight core tasks, including image-to-video generation, instruction-driven editing, and personalized video synthesis. We propose the first efficient video VAE, a scalable diffusion Transformer architecture, and integrate mixed-precision training, dynamic-resolution modeling, and lightweight inference optimization. Additionally, we curate a large-scale, multi-source video dataset and an automated evaluation framework. Experiments demonstrate that Wan-14B outperforms existing open-source and commercial models across multiple internal and external benchmarks. Wan-1.3B achieves deployment on consumer-grade GPUs with only 8.19 GB VRAM. All models, code, and data are fully open-sourced to foster reproducible research and community advancement.

Technology Category

Application Category

📝 Abstract
This report presents Wan, a comprehensive and open suite of video foundation models designed to push the boundaries of video generation. Built upon the mainstream diffusion transformer paradigm, Wan achieves significant advancements in generative capabilities through a series of innovations, including our novel VAE, scalable pre-training strategies, large-scale data curation, and automated evaluation metrics. These contributions collectively enhance the model's performance and versatility. Specifically, Wan is characterized by four key features: Leading Performance: The 14B model of Wan, trained on a vast dataset comprising billions of images and videos, demonstrates the scaling laws of video generation with respect to both data and model size. It consistently outperforms the existing open-source models as well as state-of-the-art commercial solutions across multiple internal and external benchmarks, demonstrating a clear and significant performance superiority. Comprehensiveness: Wan offers two capable models, i.e., 1.3B and 14B parameters, for efficiency and effectiveness respectively. It also covers multiple downstream applications, including image-to-video, instruction-guided video editing, and personal video generation, encompassing up to eight tasks. Consumer-Grade Efficiency: The 1.3B model demonstrates exceptional resource efficiency, requiring only 8.19 GB VRAM, making it compatible with a wide range of consumer-grade GPUs. Openness: We open-source the entire series of Wan, including source code and all models, with the goal of fostering the growth of the video generation community. This openness seeks to significantly expand the creative possibilities of video production in the industry and provide academia with high-quality video foundation models. All the code and models are available at https://github.com/Wan-Video/Wan2.1.
Problem

Research questions and friction points this paper is trying to address.

Advancing video generation through scalable diffusion transformer models
Enhancing performance with novel VAE and large-scale data curation
Providing open-source models for diverse video production tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel VAE enhances video generation quality
Scalable pre-training with large datasets
Automated metrics for performance evaluation
🔎 Similar Papers
No similar papers found.
W
WanTeam Ang Wang
Alibaba Group
Baole Ai
Baole Ai
Alibaba
Bin Wen
Bin Wen
快手
MLLM
Chaojie Mao
Chaojie Mao
Alibaba Group
Computer Vision
Chen-Wei Xie
Chen-Wei Xie
Alibaba Group
Computer VisionMachine Learning
D
Di Chen
Alibaba Group
F
Feiwu Yu
Alibaba Group
H
Haiming Zhao
Alibaba Group
J
Jianxiao Yang
Alibaba Group
J
Jianyuan Zeng
Alibaba Group
Jiayu Wang
Jiayu Wang
Beihang University & Jiangnan University & The University of Auckland
Soft sensordata drivenfault detectionprocess monitoring
J
Jingfeng Zhang
Alibaba Group
Jingren Zhou
Jingren Zhou
Alibaba Group, Microsoft
Cloud ComputingLarge Scale Distributed SystemsMachine LearningQuery ProcessingQuery
Jinkai Wang
Jinkai Wang
Alibaba Group
Jixuan Chen
Jixuan Chen
UC San Diego
Multimodal agentsNatural language processingMachine learning
K
Kai Zhu
Alibaba Group
K
Kang Zhao
Alibaba Group
K
Keyu Yan
Alibaba Group
Lianghua Huang
Lianghua Huang
Tongyi Lab
generative modeling
M
Meng Feng
Alibaba Group
N
Ningyi Zhang
Alibaba Group
Pandeng Li
Pandeng Li
University of Science and Technology of China && Alibaba Tongyi
Video RetrievalVideo GenerationRepresentation Learning
Pingyu Wu
Pingyu Wu
University of Science and Technology of China
computer vision
Ruihang Chu
Ruihang Chu
Tsinghua University, CUHK, Wan
Generative AIVision-Language ModelComputer Vision
R
Rui Feng
Alibaba Group
S
Shiwei Zhang
Alibaba Group
Siyang Sun
Siyang Sun
Alibaba Group
deep learningmulti-modal large language model
T
Tao Fang
Alibaba Group
Tianxing Wang
Tianxing Wang
Alibaba Group
T
Tianyi Gui
Alibaba Group
T
Tingyu Weng
Alibaba Group
Tong Shen
Tong Shen
Process Engineer, Enex International Inc.
W
Wei Lin
Alibaba Group
W
Wei Wang
Alibaba Group
W
Wenmeng Zhou
Alibaba Group
W
Wente Wang
Alibaba Group
Wenting Shen
Wenting Shen
Qingdao University
cloud computing,data integrity auditing
Wenyuan Yu
Wenyuan Yu
Alibaba Group
Graph computationdata managementdistributed systems and parallel computation
X
Xianzhong Shi
Alibaba Group
X
Xiaomin Huang
Alibaba Group
X
Xin Xu
Alibaba Group
Y
Yan Kou
Alibaba Group
Y
Yangyu Lv
Alibaba Group
Y
Yifei Li
Alibaba Group
Yijing Liu
Yijing Liu
Huazhong University of Science and Technology, National Institutes of Health
NanomedicineMicroneedleDrug DeliverySelf-assembly
Y
Yiming Wang
Alibaba Group
Y
Yingya Zhang
Alibaba Group
Y
Yitong Huang
Alibaba Group
Y
Yong Li
Alibaba Group
Y
You Wu
Alibaba Group
Y
Yu Liu
Alibaba Group
Yulin Pan
Yulin Pan
Alibaba Group
computer visionmultimedia search
Yun Zheng
Yun Zheng
Alibaba
Computer VisionMultimodal Modeling
Y
Yuntao Hong
Alibaba Group
Y
Yupeng Shi
Alibaba Group
Yutong Feng
Yutong Feng
Alibaba Tongyi Lab | Tsinghua University
Generative AIComputer Vision
Zeyinzi Jiang
Zeyinzi Jiang
Alibaba Group
Z
Zhen Han
Alibaba Group
Z
Zhi-Fan Wu
Alibaba Group
Z
Ziyu Liu
Alibaba Group