MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low accuracy and poor interpretability in fabric attribute recognition for textile manufacturing, apparel production, and intelligent retail, this paper proposes a multimodal large language model (MLLM)-driven robotic sorting system. The system integrates RGB vision, visuo-tactile, and pressure-sensing data into an end-to-end framework for fabric attribute understanding and decision-making. We introduce a novel multimodal explanation-guided knowledge distillation method combined with supervised fine-tuning, significantly improving both attribute ranking accuracy and decision interpretability. Our released Fabric-Llama-90B model outperforms state-of-the-art vision-language models on fabric attribute ranking and selection tasks. Concurrently, we open-source a multimodal dataset comprising 220 fabric samples—featuring synchronized RGB, tactile, and pressure modalities—establishing a new benchmark and resource for MLLM research in embodied interaction scenarios.

Technology Category

Application Category

📝 Abstract
Choosing the right fabric is crucial to meet functional and quality requirements in robotic applications for textile manufacturing, apparel production, and smart retail. We present MLLM-Fabric, a robotic framework powered by multimodal large language models (MLLMs) for fabric sorting and selection. The system includes a robotic arm, a camera, a visuotactile sensor, and a pressure sensor. It employs supervised fine-tuning and multimodal explanation-guided knowledge distillation to accurately classify and rank fabric properties. To facilitate further research, we release a dataset of 220 unique fabric samples, including RGB images and synchronized visuotactile and pressure data. Experimental results show that our Fabric-Llama-90B model consistently outperforms pretrained vision-language baselines in both property ranking accuracy and selection reliability.
Problem

Research questions and friction points this paper is trying to address.

Develop robotic framework for fabric sorting and selection
Classify and rank fabric properties accurately
Outperform pretrained vision-language baselines in accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language model-driven robotic framework
Supervised fine-tuning and knowledge distillation
Fabric-Llama-90B model outperforms baselines
🔎 Similar Papers
No similar papers found.
L
Liman Wang
School of Physics, Engineering and Technology, University of York, York YO10 5DD, United Kingdom
H
Hanyang Zhong
School of Physics, Engineering and Technology, University of York, York YO10 5DD, United Kingdom
Tianyuan Wang
Tianyuan Wang
University of York
RoboticsSoft RoboticsTensegrityCPG
Shan Luo
Shan Luo
Reader (Associate Professor), King's College London
RoboticsRobot PerceptionTactile SensingComputer VisionMachine Learning
J
Jihong Zhu
School of Physics, Engineering and Technology, University of York, York YO10 5DD, United Kingdom