OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation models predominantly rely on perspective views, which often compromise scene completeness and long-term spatiotemporal consistency, thereby limiting immersive navigation experiences. To address this, this work proposes OmniRoam, a novel framework that introduces panoramic video representation into controllable long-duration generation, enabling high-fidelity scene roaming through trajectory-based control. The approach operates in two stages: first generating a panoramic overview, followed by joint temporal extension and spatial super-resolution. Trained on a newly curated dataset comprising both synthetic and real-world panoramic videos, OmniRoam significantly outperforms existing methods in visual quality, controllability, and long-term consistency. Moreover, it supports practical extensions such as real-time generation and 3D reconstruction.
📝 Abstract
Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction. Code is available at https://github.com/yuhengliu02/OmniRoam.
Problem

Research questions and friction points this paper is trying to address.

panoramic video generation
scene modeling
long-horizon consistency
global consistency
video synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

panoramic video generation
long-horizon scene wandering
trajectory-controlled generation
spatiotemporal consistency
two-stage refinement
🔎 Similar Papers
No similar papers found.