Endless World: Real-Time 3D-Aware Long Video Generation

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

To address the challenges of 3D structural instability and inefficient real-time streaming inference in long-duration video generation, this paper proposes an infinite-length, 3D-consistent video generation framework tailored for streaming scenarios. The method introduces three key innovations: (1) a conditional autoregressive training strategy enabling cache-free, continuous inference on a single GPU; (2) a global 3D-aware attention mechanism that explicitly models inter-frame geometric consistency; and (3) a 3D injection module coupled with implicit geometric guidance to enforce physical plausibility in depth and motion estimation. Our approach achieves real-time, stable 3D video synthesis at minute-scale durations while preserving high visual fidelity—marking the first such capability. Quantitative and qualitative evaluations demonstrate significant improvements over state-of-the-art methods in geometric consistency, temporal coherence, and inference efficiency.

Technology Category

Application Category

📝 Abstract

Producing long, coherent video sequences with stable 3D structure remains a major challenge, particularly in streaming scenarios. Motivated by this, we introduce Endless World, a real-time framework for infinite, 3D-consistent video generation.To support infinite video generation, we introduce a conditional autoregressive training strategy that aligns newly generated content with existing video frames. This design preserves long-range dependencies while remaining computationally efficient, enabling real-time inference on a single GPU without additional training overhead.Moreover, our Endless World integrates global 3D-aware attention to provide continuous geometric guidance across time. Our 3D injection mechanism enforces physical plausibility and geometric consistency throughout extended sequences, addressing key challenges in long-horizon and dynamic scene synthesis.Extensive experiments demonstrate that Endless World produces long, stable, and visually coherent videos, achieving competitive or superior performance to existing methods in both visual fidelity and spatial consistency. Our project has been available on https://bwgzk-keke.github.io/EndlessWorld/.

Problem

Research questions and friction points this paper is trying to address.

Generates infinite, 3D-consistent long videos in real-time.

Maintains geometric consistency and physical plausibility over extended sequences.

Preserves long-range dependencies efficiently for real-time single-GPU inference.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time infinite 3D-consistent video generation framework

Conditional autoregressive training for long-range content alignment

Global 3D-aware attention with geometric consistency injection

🔎 Similar Papers

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency