Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the high computational cost of multi-step denoising in autoregressive video diffusion models, where existing training-free acceleration methods are constrained by a binary choice between caching and recomputation, struggling to efficiently handle intermediate cases and leading to redundancy through uniform treatment of effective frames under asynchronous scheduling. To overcome these limitations, we propose SCOPE, a training-free framework that introduces a tri-modal scheduling strategy—caching, prediction, and recomputation—augmented with selective computation. Our approach bridges the gap between caching and recomputation via Taylor extrapolation at the noise level and ensures stability through error propagation analysis. SCOPE is the first to integrate tri-modal scheduling with an extrapolation-based prediction mechanism, achieving up to 4.73× acceleration on MAGI-1 and SkyReels-V2 while preserving original video quality and outperforming all existing training-free baselines.

Technology Category

Application Category

📝 Abstract

Autoregressive (AR) video diffusion models enable long-form video generation but remain expensive due to repeated multi-step denoising. Existing training-free acceleration methods rely on binary cache-or-recompute decisions, overlooking intermediate cases where direct reuse is too coarse yet full recomputation is unnecessary. Moreover, asynchronous AR schedules assign different noise levels to co-generated frames, yet existing methods process the entire valid interval uniformly. To address these AR-specific inefficiencies, we present SCOPE, a training-free framework for efficient AR video diffusion. SCOPE introduces a tri-modal scheduler over cache, predict, and recompute, where prediction via noise-level Taylor extrapolation fills the gap between reuse and recomputation with explicit stability controls backed by error propagation analysis. It further introduces selective computation that restricts execution to the active frame interval. On MAGI-1 and SkyReels-V2, SCOPE achieves up to 4.73x speedup while maintaining quality comparable to the original output, outperforming all training-free baselines.

Problem

Research questions and friction points this paper is trying to address.

autoregressive video generation

diffusion models

computational efficiency

selective computation

noise-level scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Computation

Predictive Extrapolation

Autoregressive Video Diffusion