Bi-Manual Joint Camera Calibration and Scene Representation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-camera extrinsic calibration in multi-robot manipulation typically relies on calibration boards and involves cumbersome, manual procedures. Method: This paper proposes a markerless joint calibration and 3D mapping framework tailored for dual-arm systems. It employs an end-to-end joint optimization to simultaneously estimate camera-to-end-effector poses, inter-arm relative pose, and a scale-consistent unified 3D scene representation. Crucially, it introduces a 3D foundation model to enable dense, markerless cross-view correspondence, integrated with multi-view geometric constraints and RGB multi-camera collaborative representation learning, solved via nonlinear least-squares optimization. Results: The method demonstrates robustness across diverse tabletop scenes. The resulting unified 3D representation directly supports real-time collision detection and semantic segmentation, significantly improving deployment efficiency for dual-arm cooperative tasks—eliminating reliance on physical calibration targets and streamlining system initialization.

Technology Category

Application Category

📝 Abstract
Robot manipulation, especially bimanual manipulation, often requires setting up multiple cameras on multiple robot manipulators. Before robot manipulators can generate motion or even build representations of their environments, the cameras rigidly mounted to the robot need to be calibrated. Camera calibration is a cumbersome process involving collecting a set of images, with each capturing a pre-determined marker. In this work, we introduce the Bi-Manual Joint Calibration and Representation Framework (Bi-JCR). Bi-JCR enables multiple robot manipulators, each with cameras mounted, to circumvent taking images of calibration markers. By leveraging 3D foundation models for dense, marker-free multi-view correspondence, Bi-JCR jointly estimates: (i) the extrinsic transformation from each camera to its end-effector, (ii) the inter-arm relative poses between manipulators, and (iii) a unified, scale-consistent 3D representation of the shared workspace, all from the same captured RGB image sets. The representation, jointly constructed from images captured by cameras on both manipulators, lives in a common coordinate frame and supports collision checking and semantic segmentation to facilitate downstream bimanual coordination tasks. We empirically evaluate the robustness of Bi-JCR on a variety of tabletop environments, and demonstrate its applicability on a variety of downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Calibrate multiple cameras on bimanual robots without markers
Estimate camera poses and inter-arm transformations jointly
Build unified 3D workspace representation for collision checking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages 3D foundation models for dense correspondence
Jointly estimates camera and manipulator transformations
Constructs unified 3D workspace representation
🔎 Similar Papers
H
Haozhan Tang
Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.
T
Tianyi Zhang
Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.
Matthew Johnson-Roberson
Matthew Johnson-Roberson
Professor of Robotics, Carnegie Mellon University
RoboticsField RoboticsAutonomous VehiclesMarine Robotics
W
Weiming Zhi
Robotics Institute, Carnegie Mellon University, Pittsburgh, USA.