VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation

πŸ“… 2026-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of geometric inconsistency in zero-shot panoramic depth estimation by proposing the first framework based on the VGGT foundation model, reframing depth estimation as a multi-view 3D reprojection task. The approach introduces three key innovations: uncertainty-guided adaptive projection, structure-aware attention, and correlation-weighted 3D model refinement, enabling end-to-end inference from panorama to 3D geometry and back to depthβ€”all without any training. This design inherently unifies local view reasoning with global geometric consistency. Extensive experiments demonstrate that the method consistently outperforms both learning-based and zero-shot alternatives across diverse indoor and outdoor datasets and resolutions, achieving state-of-the-art performance in both geometric coherence and depth accuracy.

Technology Category

Application Category

πŸ“ Abstract
This paper presents VGGT-360, a novel training-free framework for zero-shot, geometry-consistent panoramic depth estimation. Unlike prior view-independent training-free approaches, VGGT-360 reformulates the task as panoramic reprojection over multi-view reconstructed 3D models by leveraging the intrinsic 3D consistency of VGGT-like foundation models, thereby unifying fragmented per-view reasoning into a coherent panoramic understanding. To achieve robust and accurate estimation, VGGT-360 integrates three plug-and-play modules that form a unified panorama-to-3D-to-depth framework: (i) Uncertainty-guided adaptive projection slices panoramas into perspective views to bridge the domain gap between panoramic inputs and VGGT's perspective prior. It estimates gradient-based uncertainty to allocate denser views to geometry-poor regions, yielding geometry-informative inputs for VGGT. (ii) Structure-saliency enhanced attention strengthens VGGT's robustness during 3D reconstruction by injecting structure-aware confidence into its attention layers, guiding focus toward geometrically reliable regions and enhancing cross-view coherence. (iii) Correlation-weighted 3D model correction refines the reconstructed 3D model by reweighting overlapping points using attention-inferred correlation scores, providing a consistent geometric basis for accurate panoramic reprojection. Extensive experiments show that VGGT-360 outperforms both trained and training-free state-of-the-art methods across multiple resolutions and diverse indoor and outdoor datasets.
Problem

Research questions and friction points this paper is trying to address.

panoramic depth estimation
zero-shot
geometry consistency
3D reconstruction
view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot depth estimation
geometry consistency
panoramic reprojection
VGGT foundation model
uncertainty-guided projection
πŸ”Ž Similar Papers
No similar papers found.