π€ AI Summary
Estimating 6D object poses for agricultural products is challenging due to their biological deformability and high intra-class shape variability. This work introduces PEAR, the first benchmark dataset providing ground-truth 6D poses and instance-level 3D deformations for eight categories of agricultural produce. Furthermore, we propose SEED, a unified framework that jointly predicts 6D pose and explicit lattice-based deformations using only RGB images and synthetic dataβwithout requiring real 3D object models. SEED employs an end-to-end network architecture, explicit deformation modeling, and a UV-level texture-enhanced synthetic training strategy. Evaluated on the same RGB inputs, SEED outperforms MegaPose on six out of the eight product categories, demonstrating that explicit shape modeling is crucial for accurate pose estimation in agricultural harvesting robotics.
π Abstract
Accurate 6D pose estimation for robotic harvesting is fundamentally hindered by the biological deformability and high intra-class shape variability of agricultural produce. Instance-level methods fail in this setting, as obtaining exact 3D models for every unique piece of produce is practically infeasible, while category-level approaches that rely on a fixed template suffer significant accuracy degradation when the prior deviates from the true instance geometry. To bridge such lack of robustness to deformation, we introduce PEAR (Pose and dEformation of Agricultural pRoduce), the first benchmark providing joint 6D pose and per-instance 3D deformation ground truth across 8 produce categories, acquired via a robotic manipulator for high annotation accuracy. Using PEAR, we show that state-of-the-art methods suffer up to 6x performance degradation when faced with the inherent geometric deviations of real-world produce. Motivated by this finding, we propose SEED (Simultaneous Estimation of posE and Deformation), a unified RGB-only framework that jointly predicts 6D pose and explicit lattice deformations from a single image across multiple produce categories. Trained entirely on synthetic data with generative texture augmentation applied at the UV level, SEED outperforms MegaPose on 6 out of 8 categories under identical RGB-only conditions, demonstrating that explicit shape modeling is a critical step toward reliable pose estimation in agricultural robotics.