FastLightGen: Fast and Light Video Generation with Fewer Steps and Parameters

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of current video generation models, which stems from their large parameter counts and multi-step sampling procedures, hindering practical deployment. To overcome this challenge, the authors propose a collaborative distillation framework that jointly compresses both model parameters and sampling steps within a unified optimization process. By integrating knowledge distillation, parameter pruning, and few-step sampling, the framework co-designs an optimal teacher model tailored to maximize the performance of the resulting student model. Experiments on HunyuanVideo-ATI2V and WanX-TI2V demonstrate that the method achieves state-of-the-art visual quality using only four sampling steps and 30% of the original parameters, significantly outperforming existing efficient video generation approaches.

Technology Category

Application Category

📝 Abstract
The recent advent of powerful video generation models, such as Hunyuan, WanX, Veo3, and Kling, has inaugurated a new era in the field. However, the practical deployment of these models is severely impeded by their substantial computational overhead, which stems from enormous parameter counts and the iterative, multi-step sampling process required during inference. Prior research on accelerating generative models has predominantly followed two distinct trajectories: reducing the number of sampling steps (e.g., LCM, DMD, and MagicDistillation) or compressing the model size for more efficient inference (e.g., ICMD). The potential of simultaneously compressing both to create a fast and lightweight model remains an unexplored avenue. In this paper, we propose FastLightGen, an algorithm that transforms large, computationally expensive models into fast, lightweight counterparts. The core idea is to construct an optimal teacher model, one engineered to maximize student performance, within a synergistic framework for distilling both model size and inference steps. Our extensive experiments on HunyuanVideo-ATI2V and WanX-TI2V reveal that a generator using 4-step sampling and 30\% parameter pruning achieves optimal visual quality under a constrained inference budget. Furthermore, FastLightGen consistently outperforms all competing methods, establishing a new state-of-the-art in efficient video generation.
Problem

Research questions and friction points this paper is trying to address.

video generation
computational overhead
model compression
sampling steps
efficient inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

video generation
model compression
step distillation
knowledge distillation
efficient inference
🔎 Similar Papers
No similar papers found.