🤖 AI Summary
RGB vision tasks for robotics require metric-scale 3D scene representations aligned with the robot’s body frame; however, camera-to-robot extrinsic calibration and dense 3D reconstruction are typically treated separately, and conventional approaches either rely on calibration targets or suffer from scale ambiguity.
Method: We propose a target-free, joint multi-camera-to-robot calibration and metric reconstruction framework. For the first time, we leverage a 3D foundation model (MASt3R) for cross-modal geometric alignment, enabling end-to-end co-optimization of camera extrinsics and dense point-cloud reconstruction within a non-parametric framework by jointly fusing visual features and robot pose observations.
Contribution/Results: The method supports both monocular and multi-camera systems, achieves high-precision calibration—outperforming target-free and marker-based baselines—using fewer than ten input images across diverse datasets, and outputs scale-accurate, robot-frame-aligned 3D maps.
📝 Abstract
Robots often rely on RGB images for tasks like manipulation and navigation. However, reliable interaction typically requires a 3D scene representation that is metric-scaled and aligned with the robot reference frame. This depends on accurate camera-to-robot calibration and dense 3D reconstruction, tasks usually treated separately, despite both relying on geometric correspondences from RGB data. Traditional calibration needs patterns, while RGB-based reconstruction yields geometry with an unknown scale in an arbitrary frame. Multi-camera setups add further complexity, as data must be expressed in a shared reference frame. We present Calib3R, a patternless method that jointly performs camera-to-robot calibration and metric-scaled 3D reconstruction via unified optimization. Calib3R handles single- and multi-camera setups on robot arms or mobile robots. It builds on the 3D foundation model MASt3R to extract pointmaps from RGB images, which are combined with robot poses to reconstruct a scaled 3D scene aligned with the robot. Experiments on diverse datasets show that Calib3R achieves accurate calibration with less than 10 images, outperforming target-less and marker-based methods.