Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video diffusion Transformers (vDiTs) suffer from high computational overhead during inference, and existing acceleration methods rely on heuristic designs with poor generalizability. To address this, we propose the first end-to-end, token-level automatic acceleration framework for vDiTs. Our approach introduces three key innovations: (1) a lightweight dynamic token selection mechanism that adaptively preserves salient intra-frame and inter-frame tokens; (2) a GPU-optimized sparse attention scheduling strategy; and (3) an evolutionary algorithm-based, timestep-aware token budget search framework that jointly optimizes fidelity and efficiency. Evaluated on standard benchmarks, our method achieves 2.4× speedup on a single GPU and scalable 13.2× acceleration across eight GPUs, while incurring less than 0.5% degradation in VBench quality—outperforming state-of-the-art acceleration techniques by a significant margin.

Technology Category

Application Category

📝 Abstract
Video diffusion transformers (vDiTs) have made impressive progress in text-to-video generation, but their high computational demands present major challenges for practical deployment. While existing acceleration methods reduce workload at various granularities, they often rely on heuristics, limiting their applicability. We introduce ASTRAEA, an automatic framework that searches for near-optimal configurations for vDiT-based video generation. At its core, ASTRAEA proposes a lightweight token selection mechanism and a memory-efficient, GPU-parallel sparse attention strategy, enabling linear reductions in execution time with minimal impact on generation quality. To determine optimal token reduction for different timesteps, we further design a search framework that leverages a classic evolutionary algorithm to automatically determine the distribution of the token budget effectively. Together, ASTRAEA achieves up to 2.4x inference speedup on a single GPU with great scalability (up to 13.2x speedup on 8 GPUs) while retaining better video quality compared to the state-of-the-art methods (<0.5% loss on the VBench score compared to the baseline vDiT models).
Problem

Research questions and friction points this paper is trying to address.

Accelerating video diffusion transformers for efficient deployment
Reducing computational demands while maintaining generation quality
Optimizing token selection and GPU-parallel sparse attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight token selection mechanism
GPU-parallel sparse attention strategy
Evolutionary algorithm for token budget
🔎 Similar Papers
No similar papers found.
H
Haosong Liu
Shanghai Jiao Tong University
Y
Yuge Cheng
Shanghai Jiao Tong University
Z
Zihan Liu
Shanghai Jiao Tong University
A
Aiyue Chen
Huawei Technologies Co.,Ltd
Yiwu Yao
Yiwu Yao
Peking University
Artificial Intelligence
C
Chen Chen
Shanghai Jiao Tong University
Jingwen Leng
Jingwen Leng
Professor, Shanghai Jiao Tong University
Computer Architecture
Y
Yu Feng
Shanghai Jiao Tong University, Shanghai Qizhi Institute
Minyi Guo
Minyi Guo
IEEE Fellow, Chair Professor, Shanghai Jiao Tong University
Parallel ComputingCompiler OptimizationCloud ComputingNetworkingBig Data