MultiCam: On-the-fly Multi-Camera Pose Estimation Using Spatiotemporal Overlaps of Known Objects

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the limitations of traditional marker-based pose estimation in multi-camera dynamic augmented reality systems, which rely on continuously visible fiducials and struggle with collaborative localization across non-overlapping camera views. The authors propose a novel markerless approach for dynamic multi-camera pose estimation that leverages spatiotemporal overlaps of known objects in the scene to construct and incrementally update a spatiotemporal scene graph, enabling cross-camera pose co-optimization. By fusing multi-view object observations within this graph structure, the method significantly enhances pose accuracy. Extensive experiments on YCB-V, T-LESS, and a newly introduced multi-camera, multi-object overlapping dataset demonstrate consistent superiority over existing methods, validating the approach’s effectiveness and robustness for markerless AR applications.

Technology Category

Application Category

📝 Abstract

Multi-camera dynamic Augmented Reality (AR) applications require a camera pose estimation to leverage individual information from each camera in one common system. This can be achieved by combining contextual information, such as markers or objects, across multiple views. While commonly cameras are calibrated in an initial step or updated through the constant use of markers, another option is to leverage information already present in the scene, like known objects. Another downside of marker-based tracking is that markers have to be tracked inside the field-of-view (FoV) of the cameras. To overcome these limitations, we propose a constant dynamic camera pose estimation leveraging spatiotemporal FoV overlaps of known objects on the fly. To achieve that, we enhance the state-of-the-art object pose estimator to update our spatiotemporal scene graph, enabling a relation even among non-overlapping FoV cameras. To evaluate our approach, we introduce a multi-camera, multi-object pose estimation dataset with temporal FoV overlap, including static and dynamic cameras. Furthermore, in FoV overlapping scenarios, we outperform the state-of-the-art on the widely used YCB-V and T-LESS dataset in camera pose accuracy. Our performance on both previous and our proposed datasets validates the effectiveness of our marker-less approach for AR applications. The code and dataset are available on https://github.com/roth-hex-lab/IEEE-VR-2026-MultiCam.

Problem

Research questions and friction points this paper is trying to address.

multi-camera pose estimation

marker-less tracking

spatiotemporal overlap

augmented reality

object-based localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-camera pose estimation

spatiotemporal overlap

marker-less tracking