MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address low accuracy and poor interpretability in fabric attribute recognition for textile manufacturing, apparel production, and intelligent retail, this paper proposes a multimodal large language model (MLLM)-driven robotic sorting system. The system integrates RGB vision, visuo-tactile, and pressure-sensing data into an end-to-end framework for fabric attribute understanding and decision-making. We introduce a novel multimodal explanation-guided knowledge distillation method combined with supervised fine-tuning, significantly improving both attribute ranking accuracy and decision interpretability. Our released Fabric-Llama-90B model outperforms state-of-the-art vision-language models on fabric attribute ranking and selection tasks. Concurrently, we open-source a multimodal dataset comprising 220 fabric samples—featuring synchronized RGB, tactile, and pressure modalities—establishing a new benchmark and resource for MLLM research in embodied interaction scenarios.

Technology Category

Application Category

📝 Abstract

Choosing the right fabric is crucial to meet functional and quality requirements in robotic applications for textile manufacturing, apparel production, and smart retail. We present MLLM-Fabric, a robotic framework powered by multimodal large language models (MLLMs) for fabric sorting and selection. The system includes a robotic arm, a camera, a visuotactile sensor, and a pressure sensor. It employs supervised fine-tuning and multimodal explanation-guided knowledge distillation to accurately classify and rank fabric properties. To facilitate further research, we release a dataset of 220 unique fabric samples, including RGB images and synchronized visuotactile and pressure data. Experimental results show that our Fabric-Llama-90B model consistently outperforms pretrained vision-language baselines in both property ranking accuracy and selection reliability.

Problem

Research questions and friction points this paper is trying to address.

Develop robotic framework for fabric sorting and selection

Classify and rank fabric properties accurately

Outperform pretrained vision-language baselines in accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language model-driven robotic framework

Supervised fine-tuning and knowledge distillation

Fabric-Llama-90B model outperforms baselines

🔎 Similar Papers

General-purpose Clothes Manipulation with Semantic Keypoints