Mind the Shape Gap: A Benchmark and Baseline for Deformation-Aware 6D Pose Estimation of Agricultural Produce

πŸ“… 2026-03-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Estimating 6D object poses for agricultural products is challenging due to their biological deformability and high intra-class shape variability. This work introduces PEAR, the first benchmark dataset providing ground-truth 6D poses and instance-level 3D deformations for eight categories of agricultural produce. Furthermore, we propose SEED, a unified framework that jointly predicts 6D pose and explicit lattice-based deformations using only RGB images and synthetic dataβ€”without requiring real 3D object models. SEED employs an end-to-end network architecture, explicit deformation modeling, and a UV-level texture-enhanced synthetic training strategy. Evaluated on the same RGB inputs, SEED outperforms MegaPose on six out of the eight product categories, demonstrating that explicit shape modeling is crucial for accurate pose estimation in agricultural harvesting robotics.
πŸ“ Abstract
Accurate 6D pose estimation for robotic harvesting is fundamentally hindered by the biological deformability and high intra-class shape variability of agricultural produce. Instance-level methods fail in this setting, as obtaining exact 3D models for every unique piece of produce is practically infeasible, while category-level approaches that rely on a fixed template suffer significant accuracy degradation when the prior deviates from the true instance geometry. To bridge such lack of robustness to deformation, we introduce PEAR (Pose and dEformation of Agricultural pRoduce), the first benchmark providing joint 6D pose and per-instance 3D deformation ground truth across 8 produce categories, acquired via a robotic manipulator for high annotation accuracy. Using PEAR, we show that state-of-the-art methods suffer up to 6x performance degradation when faced with the inherent geometric deviations of real-world produce. Motivated by this finding, we propose SEED (Simultaneous Estimation of posE and Deformation), a unified RGB-only framework that jointly predicts 6D pose and explicit lattice deformations from a single image across multiple produce categories. Trained entirely on synthetic data with generative texture augmentation applied at the UV level, SEED outperforms MegaPose on 6 out of 8 categories under identical RGB-only conditions, demonstrating that explicit shape modeling is a critical step toward reliable pose estimation in agricultural robotics.
Problem

Research questions and friction points this paper is trying to address.

6D pose estimation
agricultural produce
shape deformation
intra-class variability
robotic harvesting
Innovation

Methods, ideas, or system contributions that make the work stand out.

deformation-aware pose estimation
agricultural robotics
6D pose estimation
synthetic data training
shape modeling
πŸ”Ž Similar Papers
No similar papers found.
N
Nikolas Chatzis
Robotics Institute, Athena Research Center, Marousi, Greece; HERON - Hellenic Robotics Center of Excellence, Athens, Greece; School of Electrical & Computer Engineering, NTUA, Greece
A
Angeliki Tsinouka
Robotics Institute, Athena Research Center, Marousi, Greece; HERON - Hellenic Robotics Center of Excellence, Athens, Greece
K
Katerina Papadimitriou
Robotics Institute, Athena Research Center, Marousi, Greece; HERON - Hellenic Robotics Center of Excellence, Athens, Greece; Department of Electrical & Computer Engineering, UTH, Greece
Niki Efthymiou
Niki Efthymiou
Athena Research Center, National Technical Uneversity of Athens
human-robot interactioncomputer visionmachine learning
M
Marios Glytsos
Music Technology, New York University, USA
George Retsinas
George Retsinas
ECE NTUA, ATHENA RIC, NCSR Demokritos
Computer VisionMachine Learning
Paris Oikonomou
Paris Oikonomou
PhD Student, National Technical University of Athens
RoboticsControl and AutomationRobot Learning
Gerasimos Potamianos
Gerasimos Potamianos
University of Thessaly, Volos, Greece
Petros Maragos
Petros Maragos
Professor of Electrical and Computer Engineering, National Technical University of Athens
computer visionsignal processingspeech&languagemachine learningrobotics
Panagiotis Paraskevas Filntisis
Panagiotis Paraskevas Filntisis
Postdoctoral Researcher, NTUA; Research Assistant, Athena RC
computer visionmachine learningaffective computing