Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
This work addresses the high computational cost of multi-step denoising in autoregressive video diffusion models, where existing training-free acceleration methods are constrained by a binary choice between caching and recomputation, struggling to efficiently handle intermediate cases and leading to redundancy through uniform treatment of effective frames under asynchronous scheduling. To overcome these limitations, we propose SCOPE, a training-free framework that introduces a tri-modal scheduling strategy—caching, prediction, and recomputation—augmented with selective computation. Our approach bridges the gap between caching and recomputation via Taylor extrapolation at the noise level and ensures stability through error propagation analysis. SCOPE is the first to integrate tri-modal scheduling with an extrapolation-based prediction mechanism, achieving up to 4.73× acceleration on MAGI-1 and SkyReels-V2 while preserving original video quality and outperforming all existing training-free baselines.

Technology Category

Application Category

📝 Abstract
Autoregressive (AR) video diffusion models enable long-form video generation but remain expensive due to repeated multi-step denoising. Existing training-free acceleration methods rely on binary cache-or-recompute decisions, overlooking intermediate cases where direct reuse is too coarse yet full recomputation is unnecessary. Moreover, asynchronous AR schedules assign different noise levels to co-generated frames, yet existing methods process the entire valid interval uniformly. To address these AR-specific inefficiencies, we present SCOPE, a training-free framework for efficient AR video diffusion. SCOPE introduces a tri-modal scheduler over cache, predict, and recompute, where prediction via noise-level Taylor extrapolation fills the gap between reuse and recomputation with explicit stability controls backed by error propagation analysis. It further introduces selective computation that restricts execution to the active frame interval. On MAGI-1 and SkyReels-V2, SCOPE achieves up to 4.73x speedup while maintaining quality comparable to the original output, outperforming all training-free baselines.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation
diffusion models
computational efficiency
selective computation
noise-level scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Computation
Predictive Extrapolation
Autoregressive Video Diffusion
Training-Free Acceleration
Tri-Modal Scheduler
🔎 Similar Papers