$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Traditional visual geometric reconstruction relies on a fixed reference view, leading to instability or failure when the reference is poorly chosen. To address this, we propose the first fully permutation-equivariant framework for visual geometric reconstruction—eliminating reference-view dependency entirely. Our method employs a permutation-equivariant neural network that jointly predicts affine-invariant camera poses and scale-invariant local point maps, enabling robust, input-order-agnostic modeling. The architecture is a feedforward neural network with rigorously enforced permutation equivariance, guaranteeing symmetric handling of arbitrary input image permutations. We validate our approach on three core tasks: monocular/video depth estimation, camera pose estimation, and dense point map reconstruction. It achieves state-of-the-art performance across all benchmarks, demonstrating substantial improvements in generalization, robustness to input ordering and occlusion, and scalability to varying numbers of input views.

Technology Category

Application Category

📝 Abstract

We introduce $π^3$, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, $π^3$ employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Breaks reliance on fixed reference view in geometry reconstruction

Predicts affine-invariant camera poses without reference frames

Achieves robust performance in diverse visual geometry tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Permutation-equivariant architecture for geometry learning

Predicts affine-invariant camera poses without reference

Scale-invariant local point maps for robust reconstruction

🔎 Similar Papers

No similar papers found.