CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing panoramic video generation methods struggle to maintain geometric consistency under equirectangular projection, primarily due to the inherent complexity of spherical projection modeling and camera pose representation. To address this, we propose the first geometry-aware diffusion-based framework for panoramic video generation: (1) a panoramic Plücker embedding that precisely encodes spherical camera extrinsics; (2) a spherical epipolar module that applies adaptive attention masking along epipolar directions to achieve geometrically consistent cross-view feature aggregation; and (3) explicit modeling of spherical projection constraints via spherical coordinate transformations. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in video quality, inter-frame viewpoint coherence, and camera trajectory fidelity. To our knowledge, it is the first method to enable high-fidelity, dynamically generated panoramic videos with multi-view geometric consistency.

Technology Category

Application Category

📝 Abstract

Recently, camera-controlled video generation has seen rapid development, offering more precise control over video generation. However, existing methods predominantly focus on camera control in perspective projection video generation, while geometrically consistent panoramic video generation remains challenging. This limitation is primarily due to the inherent complexities in panoramic pose representation and spherical projection. To address this issue, we propose CamPVG, the first diffusion-based framework for panoramic video generation guided by precise camera poses. We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection. Specifically, we propose a panoramic Plücker embedding that encodes camera extrinsic parameters through spherical coordinate transformation. This pose encoder effectively captures panoramic geometry, overcoming the limitations of traditional methods when applied to equirectangular projections. Additionally, we introduce a spherical epipolar module that enforces geometric constraints through adaptive attention masking along epipolar lines. This module enables fine-grained cross-view feature aggregation, substantially enhancing the quality and consistency of generated panoramic videos. Extensive experiments demonstrate that our method generates high-quality panoramic videos consistent with camera trajectories, far surpassing existing methods in panoramic video generation.

Problem

Research questions and friction points this paper is trying to address.

Generating geometrically consistent panoramic videos with precise camera control

Overcoming limitations of panoramic pose representation and spherical projection

Enhancing cross-view feature aggregation through spherical epipolar constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Panoramic Plücker embedding for pose encoding

Spherical epipolar module with adaptive attention

Cross-view feature aggregation via spherical projection

🔎 Similar Papers

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control