🤖 AI Summary
This work addresses the challenges of handling deformable garments and identifying cluttered foreign objects in sustainable textile recycling by proposing a digital twin–driven dual-arm robotic sorting system. The system integrates RGB-D vision, capacitive tactile feedback, collision-aware motion planning, and multimodal semantic reasoning, and represents the first application of vision-language models (VLMs) to industrial-scale textile sorting. Experimental evaluation of nine VLMs demonstrates that the Qwen series achieves the highest accuracy of 87.9%, while Gemma3 offers an excellent trade-off between speed and accuracy on edge devices. These results validate the feasibility and scalability of the proposed system in real-world recycling scenarios.
📝 Abstract
The increasing demand for sustainable textile recycling requires robust automation solutions capable of handling deformable garments and detecting foreign objects in cluttered environments. This work presents a digital twin driven robotic sorting system that integrates grasp prediction, multi modal perception, and semantic reasoning for real world textile classification. A dual arm robotic cell equipped with RGBD sensing, capacitive tactile feedback, and collision-aware motion planning autonomously separates garments from an unsorted basket, transfers them to an inspection zone, and classifies them using state of the art Visual Language Models (VLMs). We benchmark nine VLM s from five model families on a dataset of 223 inspection scenarios comprising shirts, socks, trousers, underwear, foreign objects (including garments outside of the aforementioned classes), and empty scenes. The evaluation assesses per class accuracy, hallucination behavior, and computational performance under practical hardware constraints. Results show that the Qwen model family achieves the highest overall accuracy (up to 87.9 %), with strong foreign object detection performance, while lighter models such as Gemma3 offer competitive speed accuracy trade offs for edge deployment. A digital twin combined with MoveIt enables collision aware path planning and integrates segmented 3D point clouds of inspected garments into the virtual environment for improved manipulation reliability. The presented system demonstrates the feasibility of combining semantic VLM reasoning with conventional grasp detection and digital twin technology for scalable, autonomous textile sorting in realistic industrial settings.