Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising

📅 2025-01-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing diffusion models face significant challenges in generating long videos, including prohibitive computational costs, temporal discontinuities, motion incoherence, and progressive quality degradation. To address these issues, we propose a training-free “brick-to-wall” progressive latent-space denoising paradigm: long-video latent representations are partitioned into overlapping blocks; denoising is performed collaboratively in the latent space via sliding-window chunking, cross-block latent state alignment, and adaptive step scheduling. This approach overcomes the temporal modeling limitations of pretrained short-video diffusion models, substantially improving inter-frame consistency and motion naturalness. Experiments demonstrate that our method outperforms state-of-the-art baselines across multiple quantitative metrics—enabling minute-long, high-fidelity, and highly fluent video synthesis. The framework offers an efficient, scalable, and plug-and-play solution for long-video generation without requiring model retraining or architectural modification.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion models have greatly improved text-driven video generation. However, training models for long video generation demands significant computational power and extensive data, leading most video diffusion models to be limited to a small number of frames. Existing training-free methods that attempt to generate long videos using pre-trained short video diffusion models often struggle with issues such as insufficient motion dynamics and degraded video fidelity. In this paper, we present Brick-Diffusion, a novel, training-free approach capable of generating long videos of arbitrary length. Our method introduces a brick-to-wall denoising strategy, where the latent is denoised in segments, with a stride applied in subsequent iterations. This process mimics the construction of a staggered brick wall, where each brick represents a denoised segment, enabling communication between frames and improving overall video quality. Through quantitative and qualitative evaluations, we demonstrate that Brick-Diffusion outperforms existing baseline methods in generating high-fidelity videos.

Problem

Research questions and friction points this paper is trying to address.

Long Video Generation

Computational Resources

Video Quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Brick-Diffusion

Untrained Model

High-Quality Video Generation

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling