UniK3D: Universal Camera Monocular 3D Estimation

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing monocular 3D estimation methods rely on the pinhole camera model or image rectification, limiting generalization to wide-angle cameras (e.g., fisheye, omnidirectional), and causing geometric distortion and contextual loss. To address this, we propose the first universal monocular 3D estimation framework compatible with arbitrary camera models. Our method introduces a spherical 3D representation and a model-agnostic ray-bundle spherical harmonic superposition for geometry encoding. We further design an angular-space loss to mitigate depth compression under wide field-of-view conditions and enable joint self-supervised optimization of depth and intrinsic camera parameters. The framework supports zero-shot cross-camera-type transfer without retraining. Evaluated on 13 diverse datasets—including large-FOV and panoramic scenes—our approach significantly outperforms state-of-the-art methods while maintaining top-tier accuracy on standard pinhole, narrow-FOV benchmarks.

Technology Category

Application Category

📝 Abstract

Monocular 3D estimation is crucial for visual perception. However, current methods fall short by relying on oversimplified assumptions, such as pinhole camera models or rectified images. These limitations severely restrict their general applicability, causing poor performance in real-world scenarios with fisheye or panoramic images and resulting in substantial context loss. To address this, we present UniK3D, the first generalizable method for monocular 3D estimation able to model any camera. Our method introduces a spherical 3D representation which allows for better disentanglement of camera and scene geometry and enables accurate metric 3D reconstruction for unconstrained camera models. Our camera component features a novel, model-independent representation of the pencil of rays, achieved through a learned superposition of spherical harmonics. We also introduce an angular loss, which, together with the camera module design, prevents the contraction of the 3D outputs for wide-view cameras. A comprehensive zero-shot evaluation on 13 diverse datasets demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and camera metrics, with substantial gains in challenging large-field-of-view and panoramic settings, while maintaining top accuracy in conventional pinhole small-field-of-view domains. Code and models are available at github.com/lpiccinelli-eth/unik3d .

Problem

Research questions and friction points this paper is trying to address.

Monocular 3D estimation lacks generalizability for diverse camera models

Existing methods fail with fisheye or panoramic images

Need for accurate 3D reconstruction across unconstrained camera types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spherical 3D representation for unconstrained cameras

Model-independent ray representation via spherical harmonics

Angular loss prevents 3D contraction in wide-view

🔎 Similar Papers

A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts