PanoWorld: Geometry-Consistent Panoramic Video World Modeling

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
Existing methods for panoramic video generation often suffer from depth inconsistencies and motion distortions due to the lack of explicit modeling of 3D scene geometry and dynamic consistency. This work reframes panoramic video generation as a geometric consistency modeling task, leveraging a pretrained perspective-video world model and introducing regularization terms that enforce depth and trajectory consistency. To better capture spherical geometry, the approach incorporates a geometry-aware positional encoding and a conditional adaptation mechanism. Furthermore, the authors construct PanoGeo, the first unified geometry-aware panoramic video dataset, for both training and evaluation. Experiments demonstrate that the proposed method significantly improves geometric and dynamic consistency while preserving visual realism, outperforming existing approaches and effectively supporting embodied intelligence in understanding global spatial structures.
📝 Abstract
We present PanoWorld, a panoramic video world model that generates geometry-consistent 360$\degree$ video from a single image and a caption. Existing panoramic video methods optimize primarily for visual realism and do not explicitly constrain the underlying 3D scene state, producing outputs that appear plausible yet exhibit inconsistent depth, broken correspondences, and implausible motion across the spherical surface. We address this gap by framing panoramic video generation as a geometry- and dynamics-consistent latent state modeling problem rather than pure visual synthesis. Building on a pre-trained perspective video world model, we introduce two lightweight regularizers: a depth consistency loss against pseudo ground-truth panoramic depth, and a trajectory consistency loss that supervises the 3D world-frame positions of tracked points across time. We further apply spherical-geometry-aware adaptation to the conditioning and positional encoding. We additionally introduce PanoGeo, a unified geometry-aware panoramic video dataset with consistent depth, trajectory, and prompt annotations across diverse real and synthetic sources, used for both training and stratified evaluation. Experiments show that PanoWorld improves geometric consistency over prior panoramic generation methods while maintaining competitive visual realism, establishing that panoramic video generation must be treated as a geometric modeling problem to support the holistic spatial understanding requirements of embodied AI applications. Code is available at https://github.com/ostadabbas/PanoWorld.
Problem

Research questions and friction points this paper is trying to address.

panoramic video generation
geometry consistency
3D scene modeling
spherical surface
embodied AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-consistent generation
panoramic video modeling
depth consistency
trajectory consistency
spherical geometry adaptation
🔎 Similar Papers