PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

190K/year
πŸ€– AI Summary
This work addresses the limitations of conventional perspective camera models in joint pose estimation and 3D reconstruction from panoramic images, which suffer from non-pinhole distortions that hinder generalization. The authors propose PanoVGGT, a permutation-equivariant Transformer framework capable of jointly predicting camera pose, depth maps, and 3D point clouds from single or multiple panoramic images in a single forward pass. The method introduces spherical-aware positional encoding, a panoramic-specific triaxial SO(3) rotation augmentation, and a random anchoring training strategy to effectively model spherical geometry and alleviate ambiguities arising from global coordinate alignment. Extensive experiments on the authors’ newly curated large-scale panoramic dataset, PanoCity, along with standard benchmarks, demonstrate that PanoVGGT achieves state-of-the-art performance in terms of accuracy, robustness, and cross-domain generalization.

Technology Category

Application Category

πŸ“ Abstract
Panoramic imagery offers a full 360Β° field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

panoramic imagery
3D reconstruction
pose estimation
non-pinhole distortion
feed-forward model
Innovation

Methods, ideas, or system contributions that make the work stand out.

permutation-equivariant Transformer
spherical-aware positional embeddings
three-axis SO(3) rotation augmentation
stochastic anchoring
panoramic 3D reconstruction
πŸ”Ž Similar Papers