MultiCam: On-the-fly Multi-Camera Pose Estimation Using Spatiotemporal Overlaps of Known Objects

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional marker-based pose estimation in multi-camera dynamic augmented reality systems, which rely on continuously visible fiducials and struggle with collaborative localization across non-overlapping camera views. The authors propose a novel markerless approach for dynamic multi-camera pose estimation that leverages spatiotemporal overlaps of known objects in the scene to construct and incrementally update a spatiotemporal scene graph, enabling cross-camera pose co-optimization. By fusing multi-view object observations within this graph structure, the method significantly enhances pose accuracy. Extensive experiments on YCB-V, T-LESS, and a newly introduced multi-camera, multi-object overlapping dataset demonstrate consistent superiority over existing methods, validating the approach’s effectiveness and robustness for markerless AR applications.

Technology Category

Application Category

📝 Abstract
Multi-camera dynamic Augmented Reality (AR) applications require a camera pose estimation to leverage individual information from each camera in one common system. This can be achieved by combining contextual information, such as markers or objects, across multiple views. While commonly cameras are calibrated in an initial step or updated through the constant use of markers, another option is to leverage information already present in the scene, like known objects. Another downside of marker-based tracking is that markers have to be tracked inside the field-of-view (FoV) of the cameras. To overcome these limitations, we propose a constant dynamic camera pose estimation leveraging spatiotemporal FoV overlaps of known objects on the fly. To achieve that, we enhance the state-of-the-art object pose estimator to update our spatiotemporal scene graph, enabling a relation even among non-overlapping FoV cameras. To evaluate our approach, we introduce a multi-camera, multi-object pose estimation dataset with temporal FoV overlap, including static and dynamic cameras. Furthermore, in FoV overlapping scenarios, we outperform the state-of-the-art on the widely used YCB-V and T-LESS dataset in camera pose accuracy. Our performance on both previous and our proposed datasets validates the effectiveness of our marker-less approach for AR applications. The code and dataset are available on https://github.com/roth-hex-lab/IEEE-VR-2026-MultiCam.
Problem

Research questions and friction points this paper is trying to address.

multi-camera pose estimation
marker-less tracking
spatiotemporal overlap
augmented reality
object-based localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-camera pose estimation
spatiotemporal overlap
marker-less tracking
scene graph
augmented reality
🔎 Similar Papers
No similar papers found.
S
Shiyu Li
Technische Universtität München, Human-Centered-Computing and Extended Reality Lab, Klinikum rechts der Isar, Orthopedics and Sports Orthopedics, Munich Institute of Robotics and Machine Intelligence (MIRMI), Germany
Hannah Schieber
Hannah Schieber
HEX-Lab @ TU Munich
Computer VisionAugmented RealityVirtual Reality3D ReconstructionSemantic Segmentation
K
Kristoffer Waldow
TH Köln, Computer Graphics Group and Technische Universität München, Human-Centered-Computing and Extended Reality Lab, Germany
Benjamin Busam
Benjamin Busam
Technical University of Munich
PhotogrammetryComputer VisionMachine LearningSensor FusionEmbodied AI
J
Julian Kreimeier
Technische Universtität München, Human-Centered-Computing and Extended Reality Lab, Klinikum rechts der Isar, Orthopedics and Sports Orthopedics, Munich Institute of Robotics and Machine Intelligence (MIRMI), Germany
Daniel Roth
Daniel Roth
Technical University of Munich
Human-Centered ComputingExtended RealityArtificial IntelligenceRoboticsDigital Health