MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

High-speed mobile robots suffer severe degradation in RGB-camera-based 6D pose estimation and moving object detection under high-dynamic conditions due to motion blur and latency. Method: We introduce the first multi-task event-camera benchmark specifically designed for high-speed dynamic environments, comprising 75 challenging real-world scenes. It supports long-range (>5 m) and high-velocity (5–10 m/s) 6D pose estimation and moving object detection, uniquely incorporating extreme viewpoints, strong illumination variations, and complex occlusions. Synchronized stereo event-camera and RGB data are collected, and cross-modal evaluation is conducted using baselines including FoundationPose. Contribution/Results: Experiments show RGB-only methods achieve only 0.22 average recall for 6D pose estimation, whereas event-based approaches significantly improve robustness and real-time performance. The dataset is publicly released, establishing the first open benchmark for multi-task high-speed robotic vision.

Technology Category

Application Category

📝 Abstract

Mobile robots are reaching unprecedented speeds, with platforms like Unitree B2, and Fraunhofer O3dyn achieving maximum speeds between 5 and 10 m/s. However, effectively utilizing such speeds remains a challenge due to the limitations of RGB cameras, which suffer from motion blur and fail to provide real-time responsiveness. Event cameras, with their asynchronous operation, and low-latency sensing, offer a promising alternative for high-speed robotic perception. In this work, we introduce MTevent, a dataset designed for 6D pose estimation and moving object detection in highly dynamic environments with large detection distances. Our setup consists of a stereo-event camera and an RGB camera, capturing 75 scenes, each on average 16 seconds, and featuring 16 unique objects under challenging conditions such as extreme viewing angles, varying lighting, and occlusions. MTevent is the first dataset to combine high-speed motion, long-range perception, and real-world object interactions, making it a valuable resource for advancing event-based vision in robotics. To establish a baseline, we evaluate the task of 6D pose estimation using NVIDIA's FoundationPose on RGB images, achieving an Average Recall of 0.22 with ground-truth masks, highlighting the limitations of RGB-based approaches in such dynamic settings. With MTevent, we provide a novel resource to improve perception models and foster further research in high-speed robotic vision. The dataset is available for download https://huggingface.co/datasets/anas-gouda/MTevent

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of RGB cameras in high-speed robotics due to motion blur and latency

Introduces MTevent dataset for 6D pose estimation and moving object detection in dynamic environments

Provides a resource to advance event-based vision for high-speed robotic perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stereo-event camera setup for dynamic environments

First dataset combining high-speed motion and long-range perception

Baseline evaluation using NVIDIA's FoundationPose on RGB

🔎 Similar Papers

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

2024-10-09arXiv.orgCitations: 0

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)