PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of conventional perspective camera models in joint pose estimation and 3D reconstruction from panoramic images, which suffer from non-pinhole distortions that hinder generalization. The authors propose PanoVGGT, a permutation-equivariant Transformer framework capable of jointly predicting camera pose, depth maps, and 3D point clouds from single or multiple panoramic images in a single forward pass. The method introduces spherical-aware positional encoding, a panoramic-specific triaxial SO(3) rotation augmentation, and a random anchoring training strategy to effectively model spherical geometry and alleviate ambiguities arising from global coordinate alignment. Extensive experiments on the authors’ newly curated large-scale panoramic dataset, PanoCity, along with standard benchmarks, demonstrate that PanoVGGT achieves state-of-the-art performance in terms of accuracy, robustness, and cross-domain generalization.

Technology Category

Application Category

πŸ“ Abstract
Panoramic imagery offers a full 360Β° field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

panoramic imagery
3D reconstruction
pose estimation
non-pinhole distortion
feed-forward model
Innovation

Methods, ideas, or system contributions that make the work stand out.

permutation-equivariant Transformer
spherical-aware positional embeddings
three-axis SO(3) rotation augmentation
stochastic anchoring
panoramic 3D reconstruction
πŸ”Ž Similar Papers
No similar papers found.
Y
Yijing Guo
ShanghaiTech University
M
Mengjun Chao
ShanghaiTech University
L
Luo Wang
ShanghaiTech University
T
Tianyang Zhao
ShanghaiTech University
H
Haizhao Dai
ShanghaiTech University
Yingliang Zhang
Yingliang Zhang
DGene
Neural RepresentationLight Field3D Reconstruction
Jingyi Yu
Jingyi Yu
Professor, ShanghaiTech University
Computer VisionComputer Graphics
Yujiao Shi
Yujiao Shi
ShanghaiTech University
3D Computer Vision