🤖 AI Summary
Precise visuo-tactile coordination is critical for disassembling flexible components—such as lithium-ion batteries—in recycling scenarios, yet existing methods suffer from poor multimodal fusion and limited generalization in contact-rich, unstructured environments.
Method: We propose a diffusion-based policy framework featuring cross-dimensional force–vision alignment. It is the first to effectively incorporate six-axis force feedback into diffusion policies, leveraging a multimodal feature alignment encoder for real-time tactile–visual observation fusion and enabling end-to-end skill learning.
Contribution/Results: Evaluated on real-world battery prying tasks, our method achieves a 96% success rate—outperforming vision-only baselines by 57%. Crucially, it supports zero-shot transfer to unseen battery types and objects without retraining. By addressing key bottlenecks in multimodal integration and generalization under dense physical interaction, the framework delivers a scalable, embodied intelligence solution for compliant manipulation of deformable objects.
📝 Abstract
The growing adoption of batteries in the electric vehicle industry and various consumer products has created an urgent need for effective recycling solutions. These products often contain a mix of compliant and rigid components, making robotic disassembly a critical step toward achieving scalable recycling processes. Diffusion policy has emerged as a promising approach for learning low-level skills in robotics. To effectively apply diffusion policy to contact-rich tasks, incorporating force as feedback is essential. In this paper, we apply diffusion policy with vision and force in a compliant object prying task. However, when combining low-dimensional contact force with high-dimensional image, the force information may be diluted. To address this issue, we propose a method that effectively integrates force with image data for diffusion policy observations. We validate our approach on a battery prying task that demands high precision and multi-step execution. Our model achieves a 96% success rate in diverse scenarios, marking a 57% improvement over the vision-only baseline. Our method also demonstrates zero-shot transfer capability to handle unseen objects and battery types. Supplementary videos and implementation codes are available on our project website. https://rros-lab.github.io/diffusion-with-force.github.io/