🤖 AI Summary
This study addresses the failure of purely vision-based strategies in robotic disassembly tasks, where dense contacts, strong geometric constraints, or object deformability hinder performance. To tackle this challenge, the authors establish a comprehensive disassembly benchmark encompassing both rigid and deformable objects in simulation and real-world settings, and introduce a unified reinforcement learning framework to systematically compare three perceptual configurations: vision-only, vision combined with tactile RGB (TacRGB), and vision fused with tactile force fields (TacFF). The findings reveal that TacFF plays a critical role in contact-dominated tasks and demonstrate that naive multimodal fusion often dilutes informative tactile signals. Experimental results show that policies leveraging TacFF achieve the highest success rates in both simulated and physical environments, with particularly pronounced advantages in scenarios involving high contact density and deformable materials.
📝 Abstract
Robotic disassembly involves contact-rich interactions in which successful manipulation depends not only on geometric alignment but also on force-dependent state transitions. While vision-based policies perform well in structured settings, their reliability often degrades in tight-tolerance, contact-dominated, or deformable scenarios. In this work, we systematically investigate the role of tactile sensing in robotic disassembly through both simulation and real-world experiments. We construct five rigid-body disassembly tasks in simulation with increasing geometric constraints and extraction difficulty. We further design five real-world tasks, including three rigid and two deformable scenarios, to evaluate contact-dependent manipulation. Within a unified learning framework, we compare three sensing configurations: Vision Only, Vision + tactile RGB (TacRGB), and Vision + tactile force field (TacFF). Across both simulation and real-world experiments, TacFF-based policies consistently achieve the highest success rates, with particularly notable gains in contact-dependent and deformable settings. Notably, naive fusion of TacRGB and TacFF underperforms either modality alone, indicating that simple concatenation can dilute task-relevant force information. Our results show that tactile sensing plays a critical, task-dependent role in robotic disassembly, with structured force-field representations being particularly effective in contact-dominated scenarios.