Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing foundation models for 3D reconstruction when applied to fisheye images, primarily caused by the nonlinear projection-induced spatial distortion and the scarcity of ground-truth fisheye training data. To overcome this limitation, the authors propose Fisheye3R, a novel framework that, for the first time, enables adaptation of general-purpose 3D reconstruction models to highly distorted fisheye inputs without requiring any fisheye ground truth. Fisheye3R integrates self-supervised learning—using only unlabeled perspective images—with supervised adaptation strategies and is compatible with diverse foundation architectures such as VGGT, π³, and MapAnything. Experiments demonstrate that Fisheye3R substantially improves reconstruction accuracy on fisheye imagery across multiple tasks, including camera pose, depth, point cloud, and field-of-view estimation, while fully preserving the original model performance on perspective images.

Technology Category

Application Category

📝 Abstract

Feed-forward foundation models for multi-view 3-dimensional (3D) reconstruction have been trained on large-scale datasets of perspective images; when tested on wide field-of-view images, e.g., from a fisheye camera, their performance degrades. Their error arises from changes in spatial positions of pixels due to a non-linear projection model that maps 3D points onto the 2D image plane. While one may surmise that training on fisheye images would resolve this problem, there are far fewer fisheye images with ground truth than perspective images, which limit generalization. To enable inference on imagery exhibiting high radial distortion, we propose Fisheye3R, a novel adaptation framework that extends these multi-view 3D reconstruction foundation models to natively accommodate fisheye inputs without performance regression on perspective images. To address the scarcity of fisheye images and ground truth, we introduce flexible learning schemes that support self-supervised adaptation using only unlabeled perspective images and supervised adaptation without any fisheye training data. Extensive experiments across three foundation models, including VGGT, $π^3$, and MapAnything, demonstrate that our approach consistently improves camera pose, depth, point map, and field-of-view estimation on fisheye images.

Problem

Research questions and friction points this paper is trying to address.

fisheye images

3D reconstruction

radial distortion

foundation models

domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

fisheye adaptation

3D reconstruction

foundation models