PoseStreamer: A Multi-modal Framework for 6DoF Pose Estimation of Unseen Moving Objects

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the limitations of RGB cameras—motion blur under high-speed motion and low-light conditions—and the poor generalization of event-camera-based methods for 6DoF pose estimation of unknown objects, this paper proposes the first end-to-end multimodal RGB-event fusion framework. Our method introduces three key innovations: (1) an adaptive pose memory queue for temporal modeling; (2) an object-centric 2D tracker enforcing geometric consistency; and (3) a ray-based pose filter integrating template-free feature matching and ray-space optimization. We further introduce MoCapCube6D, the first multimodal benchmark dataset tailored for fast-motion 6DoF pose estimation. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in high-speed scenarios, while enabling zero-shot generalization to previously unseen moving objects with accurate 6DoF pose estimation.

Technology Category

Application Category

📝 Abstract

Six degree of freedom (6DoF) pose estimation for novel objects is a critical task in computer vision, yet it faces significant challenges in high-speed and low-light scenarios where standard RGB cameras suffer from motion blur. While event cameras offer a promising solution due to their high temporal resolution, current 6DoF pose estimation methods typically yield suboptimal performance in high-speed object moving scenarios. To address this gap, we propose PoseStreamer, a robust multi-modal 6DoF pose estimation framework designed specifically on high-speed moving scenarios. Our approach integrates three core components: an Adaptive Pose Memory Queue that utilizes historical orientation cues for temporal consistency, an Object-centric 2D Tracker that provides strong 2D priors to boost 3D center recall, and a Ray Pose Filter for geometric refinement along camera rays. Furthermore, we introduce MoCapCube6D, a novel multi-modal dataset constructed to benchmark performance under rapid motion. Extensive experiments demonstrate that PoseStreamer not only achieves superior accuracy in high-speed moving scenarios, but also exhibits strong generalizability as a template-free framework for unseen moving objects.

Problem

Research questions and friction points this paper is trying to address.

Estimates 6DoF pose for unseen moving objects

Addresses challenges in high-speed and low-light scenarios

Improves performance using multi-modal data integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal fusion of event and RGB cameras

Adaptive Pose Memory Queue for temporal consistency

Ray Pose Filter for geometric refinement

🔎 Similar Papers

No similar papers found.