🤖 AI Summary
This work addresses the high inference latency of VAE decoders, which has become a critical bottleneck in latent diffusion models for video generation. The authors propose a general, plug-and-play VAE acceleration framework that preserves the latent space distribution with strict fidelity while significantly improving efficiency. By integrating independence-aware channel pruning, staged dominant operator optimization—including an improved causal 3D convolution—and a three-stage dynamic knowledge distillation strategy, the method effectively transfers the capabilities of the original model. Evaluated on the Wan and LTX-Video benchmarks, the approach achieves approximately 6× acceleration in VAE decoding with 96.9% reconstruction performance retention, yielding a 36% end-to-end speedup in video generation while incurring negligible quality degradation.
📝 Abstract
Latent diffusion models have enabled high-quality video synthesis, yet their inference remains costly and time-consuming. As diffusion transformers become increasingly efficient, the latency bottleneck inevitably shifts to VAE decoders. To reduce their latency while maintaining quality, we propose a universal acceleration framework for VAE decoders that preserves full alignment with the original latent distribution. Specifically, we propose (1) an independence-aware channel pruning method to effectively mitigate severe channel redundancy, and (2) a stage-wise dominant operator optimization strategy to address the high inference cost of the widely used causal 3D convolutions in VAE decoders. Based on these innovations, we construct a Flash-VAED family. Moreover, we design a three-phase dynamic distillation framework that efficiently transfers the capabilities of the original VAE decoder to Flash-VAED. Extensive experiments on Wan and LTX-Video VAE decoders demonstrate that our method outperforms baselines in both quality and speed, achieving approximately a 6$\times$ speedup while maintaining the reconstruction performance up to 96.9%. Notably, Flash-VAED accelerates the end-to-end generation pipeline by up to 36% with negligible quality drops on VBench-2.0.