CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 6D pose estimation benchmarks primarily target domestic scenes or simplified, manually arranged industrial environments, failing to reflect the complex challenges—such as severe occlusion, fine-grained distractors, and multi-sensor discrepancies—encountered in real-world robotic manipulation. To address this gap, we introduce CHIP, the first multi-sensor 6D pose benchmark explicitly designed for robotic arm manipulation of chairs in authentic industrial settings. CHIP comprises 77,811 RGB-D frames with precise 6D ground truth (averaging 11,115 frames per chair) across seven real chair categories; ground truth is automatically calibrated via robot forward kinematics in unstructured, non-desktop, production-line environments. The benchmark enables evaluation of generalization across sensors, occlusion levels, and zero-prior conditions. Extensive baseline experiments reveal that state-of-the-art zero-shot methods suffer significant performance degradation in industrial contexts, confirming CHIP’s strong challenge and practical relevance.

Technology Category

Application Category

📝 Abstract
Accurate 6D pose estimation of complex objects in 3D environments is essential for effective robotic manipulation. Yet, existing benchmarks fall short in evaluating 6D pose estimation methods under realistic industrial conditions, as most datasets focus on household objects in domestic settings, while the few available industrial datasets are limited to artificial setups with objects placed on tables. To bridge this gap, we introduce CHIP, the first dataset designed for 6D pose estimation of chairs manipulated by a robotic arm in a real-world industrial environment. CHIP includes seven distinct chairs captured using three different RGBD sensing technologies and presents unique challenges, such as distractor objects with fine-grained differences and severe occlusions caused by the robotic arm and human operators. CHIP comprises 77,811 RGBD images annotated with ground-truth 6D poses automatically derived from the robot's kinematics, averaging 11,115 annotations per chair. We benchmark CHIP using three zero-shot 6D pose estimation methods, assessing performance across different sensor types, localization priors, and occlusion levels. Results show substantial room for improvement, highlighting the unique challenges posed by the dataset. CHIP will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

Lack of industrial datasets for 6D pose estimation in real-world settings
Need for accurate chair pose estimation in robotic manipulation scenarios
Challenges from occlusions and fine-grained distractors in industrial environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-sensor RGBD dataset for industrial chairs
Automated 6D pose annotation via robot kinematics
Benchmarked with zero-shot pose estimation methods
🔎 Similar Papers
No similar papers found.
M
Mattia Nardon
FBK-TeV
M
Mikel Mujika Agirre
Ikerlan
A
Ander Gonz'alez Tom'e
Ikerlan
D
Daniel Sedano Algarabel
Ikerlan
J
Josep Rueda Collell
Ikerlan
A
Ana Paola Caro
Andreu World
A
Andrea Caraffa
FBK-TeV
Fabio Poiesi
Fabio Poiesi
Fondazione Bruno Kessler
Computer Vision
P
Paul Ian Chippendale
FBK-TeV
Davide Boscaini
Davide Boscaini
Fondazione Bruno Kessler
Geometric Deep LearningComputer Vision