AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Autoregressive video diffusion models (AR-VDMs) suffer from insufficient sample fidelity, while existing inference-time alignment methods incur high computational overhead and lack adaptability. Method: We propose the first path-wise noise refinement framework tailored for AR-VDMs. During inference, it performs reflective noise re-refinement along stochastic denoising trajectories—requiring no model fine-tuning or parameter updates. Key innovations include a path-aware noise reweighting mechanism, a feedforward noise modulation module, and a reflective KV cache designed to preserve autoregressive dependencies—overcoming the failure of direct adaptation from text-to-image noise refiners. Contribution/Results: As a lightweight plug-in, our method introduces negligible computational overhead, significantly improving inter-frame consistency and fine-grained detail fidelity. It enables real-time video generation and supports interactive applications.

Technology Category

Application Category

📝 Abstract

Autoregressive video diffusion models (AR-VDMs) show strong promise as scalable alternatives to bidirectional VDMs, enabling real-time and interactive applications. Yet there remains room for improvement in their sample fidelity. A promising solution is inference-time alignment, which optimizes the noise space to improve sample fidelity without updating model parameters. Yet, optimization- or search-based methods are computationally impractical for AR-VDMs. Recent text-to-image (T2I) works address this via feedforward noise refiners that modulate sampled noises in a single forward pass. Can such noise refiners be extended to AR-VDMs? We identify the failure of naively extending T2I noise refiners to AR-VDMs and propose AutoRefiner-a noise refiner tailored for AR-VDMs, with two key designs: pathwise noise refinement and a reflective KV-cache. Experiments demonstrate that AutoRefiner serves as an efficient plug-in for AR-VDMs, effectively enhancing sample fidelity by refining noise along stochastic denoising paths.

Problem

Research questions and friction points this paper is trying to address.

Enhances sample fidelity in autoregressive video diffusion models

Extends noise refinement techniques from text-to-image to video generation

Optimizes stochastic denoising paths without updating model parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tailored noise refiner for autoregressive video diffusion models

Pathwise noise refinement along stochastic denoising paths

Reflective KV-cache design for efficient single-pass modulation

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling