Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RGB vision tasks for robotics require metric-scale 3D scene representations aligned with the robot’s body frame; however, camera-to-robot extrinsic calibration and dense 3D reconstruction are typically treated separately, and conventional approaches either rely on calibration targets or suffer from scale ambiguity. Method: We propose a target-free, joint multi-camera-to-robot calibration and metric reconstruction framework. For the first time, we leverage a 3D foundation model (MASt3R) for cross-modal geometric alignment, enabling end-to-end co-optimization of camera extrinsics and dense point-cloud reconstruction within a non-parametric framework by jointly fusing visual features and robot pose observations. Contribution/Results: The method supports both monocular and multi-camera systems, achieves high-precision calibration—outperforming target-free and marker-based baselines—using fewer than ten input images across diverse datasets, and outputs scale-accurate, robot-frame-aligned 3D maps.

Technology Category

Application Category

📝 Abstract
Robots often rely on RGB images for tasks like manipulation and navigation. However, reliable interaction typically requires a 3D scene representation that is metric-scaled and aligned with the robot reference frame. This depends on accurate camera-to-robot calibration and dense 3D reconstruction, tasks usually treated separately, despite both relying on geometric correspondences from RGB data. Traditional calibration needs patterns, while RGB-based reconstruction yields geometry with an unknown scale in an arbitrary frame. Multi-camera setups add further complexity, as data must be expressed in a shared reference frame. We present Calib3R, a patternless method that jointly performs camera-to-robot calibration and metric-scaled 3D reconstruction via unified optimization. Calib3R handles single- and multi-camera setups on robot arms or mobile robots. It builds on the 3D foundation model MASt3R to extract pointmaps from RGB images, which are combined with robot poses to reconstruct a scaled 3D scene aligned with the robot. Experiments on diverse datasets show that Calib3R achieves accurate calibration with less than 10 images, outperforming target-less and marker-based methods.
Problem

Research questions and friction points this paper is trying to address.

Joint camera-robot calibration and metric 3D reconstruction
Eliminating calibration patterns and scale ambiguity
Handling single and multi-camera robotic systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Patternless joint calibration and 3D reconstruction
Uses MASt3R foundation model for RGB pointmap extraction
Unified optimization for metric-scaled robot-aligned scenes
🔎 Similar Papers
No similar papers found.