SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 6D pose estimation datasets (e.g., LM-O, YCB-V) are captured under fixed illumination and sensor settings, failing to reflect real-world performance degradation caused by dynamic variations in exposure, gain, and depth-sensing modes—and overlooking the potential of test-time sensor adaptation. Method: We introduce the first RGB-D benchmark explicitly designed for perception robustness, systematically covering 13 exposure levels, 9 gain settings, and 4 depth-sensing modes—including all their combinations. Crucially, we extend 6D pose evaluation from static, data-driven paradigms to dynamic, sensor-configuration optimization, enabling joint RGB-D adaptive control at inference time. Results: Experiments demonstrate that this strategy significantly improves model generalization—outperforming conventional data augmentation and rivaling large-scale dataset expansion. The strongest gains arise when RGB and depth modalities are co-adapted, underscoring the value of cross-modal sensor orchestration for robust pose estimation.

Technology Category

Application Category

📝 Abstract
Recent advances on 6D object-pose estimation has achieved high performance on representative benchmarks such as LM-O, YCB-V, and T-Less. However, these datasets were captured under fixed illumination and camera settings, leaving the impact of real-world variations in illumination, exposure, gain or depth-sensor mode - and the potential of test-time sensor control to mitigate such variations - largely unexplored. To bridge this gap, we introduce SenseShift6D, the first RGB-D dataset that physically sweeps 13 RGB exposures, 9 RGB gains, auto-exposure, 4 depth-capture modes, and 5 illumination levels. For three common household objects (spray, pringles, and tincase), we acquire 101.9k RGB and 10k depth images, which can provide 1,380 unique sensor-lighting permutations per object pose. Experiments with state-of-the-art models on our dataset show that applying sensor control during test-time induces greater performance improvement over digital data augmentation, achieving performance comparable to or better than costly increases in real-world training data quantity and diversity. Adapting either RGB or depth sensors individually is effective, while jointly adapting multimodal RGB-D configurations yields even greater improvements. SenseShift6D extends the 6D-pose evaluation paradigm from data-centered to sensor-aware robustness, laying a foundation for adaptive, self-tuning perception systems capable of operating robustly in uncertain real-world environments. Our dataset is available at: huggingface.co/datasets/Yegyu/SenseShift6D Associated scripts can be found at: github.com/yegyu-han/SenseShift6D
Problem

Research questions and friction points this paper is trying to address.

Evaluates 6D pose estimation robustness under varying sensor and lighting conditions
Introduces first RGB-D dataset with diverse sensor-lighting permutations
Explores test-time sensor control to improve real-world performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces multimodal RGB-D dataset with sensor variations
Uses test-time sensor control for performance improvement
Adapts RGB and depth sensors jointly for robustness
🔎 Similar Papers
No similar papers found.
Y
Yegyu Han
Graduate School of Data Science, Seoul National University
T
Taegyoon Yoon
Graduate School of Data Science, Seoul National University
D
Dayeon Woo
Graduate School of Data Science, Seoul National University
S
Sojeong Kim
Department of EECS, Gwangju Institute of Science and Technology
Hyung-Sin Kim
Hyung-Sin Kim
Seoul National University, Data Science
On-device AIMachine learningComputer visionInternet of Things