JENGA: Object selection and pose estimation for robotic grasping from a stack

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of autonomous grasp target selection and accurate 6DoF pose estimation for robots operating in stacked, structured environments (e.g., bricklaying, warehouse stacking), where objects are subject to multi-layer occlusion. We propose the first unified framework jointly optimizing object selection and pose estimation, incorporating a hierarchical selection policy that prioritizes unoccluded top-layer objects. To support systematic evaluation, we introduce the first dedicated benchmark dataset for stacked scenes and define a composite metric integrating selection rationality and pose accuracy. Our method builds upon a tightly coupled camera–IMU perception architecture, synergistically fusing geometric priors with deep learning features to enable robust stack-layer parsing and 6DoF pose regression. Extensive experiments on our custom dataset demonstrate significant improvements over baseline methods. Furthermore, real-world deployment in robotic brick grasping validates the approach’s practicality and reliability under challenging conditions—including variable illumination and partial occlusion.

Technology Category

Application Category

📝 Abstract
Vision-based robotic object grasping is typically investigated in the context of isolated objects or unstructured object sets in bin picking scenarios. However, there are several settings, such as construction or warehouse automation, where a robot needs to interact with a structured object formation such as a stack. In this context, we define the problem of selecting suitable objects for grasping along with estimating an accurate 6DoF pose of these objects. To address this problem, we propose a camera-IMU based approach that prioritizes unobstructed objects on the higher layers of stacks and introduce a dataset for benchmarking and evaluation, along with a suitable evaluation metric that combines object selection with pose accuracy. Experimental results show that although our method can perform quite well, this is a challenging problem if a completely error-free solution is needed. Finally, we show results from the deployment of our method for a brick-picking application in a construction scenario.
Problem

Research questions and friction points this paper is trying to address.

Selecting graspable objects from structured stacks
Estimating accurate 6DoF poses for robotic grasping
Prioritizing unobstructed objects in layered stack formations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-IMU based object selection
Prioritizes unobstructed higher stack objects
Combines selection with pose accuracy metric
🔎 Similar Papers
No similar papers found.
S
Sai Srinivas Jeevanandam
German Research Center for Artificial Intelligence (DFKI)
S
Sandeep Inuganti
German Research Center for Artificial Intelligence (DFKI), RPTU Kaiserslautern
S
Shreedhar Govil
German Research Center for Artificial Intelligence (DFKI)
Didier Stricker
Didier Stricker
Professor for Computer Science, University Kaiserslautern
augmented realitycomputer visionimage processingbody sensor networkshci
Jason Rambach
Jason Rambach
Team Leader Spatial Sensing and Machine Perception, DFKI GmbH
Computer VisionPattern RecognitionMachine LearningSignal Processing