AssemMate: Graph-Based LLM for Robotic Assembly Assistance

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing LLM-driven robotic assembly systems suffer from verbose textual knowledge representations, resulting in inefficient reasoning, poor real-time performance, and excessive context length. To address these limitations, this paper proposes a graph-augmented large language model framework: (1) it introduces knowledge graphs (KGs) into LLM-based assembly systems for structured, semantically rich assembly knowledge representation; (2) it designs a self-supervised graph convolutional network (GCN) to achieve cross-modal alignment between graph-structured knowledge and natural language, integrated with vision-enhanced perception for robust handling of complex occluded and stacked scenes and precise grasping. The framework unifies natural language interaction, task planning, and low-level action execution. Experiments in both simulation and real-world settings demonstrate a 6.4% improvement in assembly accuracy, a threefold increase in inference speed, a 28× reduction in context length, and strong generalization across unseen tasks and object configurations.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-based robotic assembly assistance has gained significant research attention. It requires the injection of domain-specific knowledge to guide the assembly process through natural language interaction with humans. Despite some progress, existing methods represent knowledge in the form of natural language text. Due to the long context and redundant content, they struggle to meet the robots' requirements for real-time and precise reasoning. In order to bridge this gap, we present AssemMate, which utilizes the graph extemdash a concise and accurate form of knowledge representation extemdash as input. This graph-based LLM enables knowledge graph question answering (KGQA), supporting human-robot interaction and assembly task planning for specific products. Beyond interactive QA, AssemMate also supports sensing stacked scenes and executing grasping to assist with assembly. Specifically, a self-supervised Graph Convolutional Network (GCN) encodes knowledge graph entities and relations into a latent space and aligns them with LLM's representation, enabling the LLM to understand graph information. In addition, a vision-enhanced strategy is employed to address stacked scenes in grasping. Through training and evaluation, AssemMate outperforms existing methods, achieving 6.4% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs. And our approach further demonstrates superiority through robotic grasping experiments in both simulated and real-world settings. More details can be found on the project page: https://github.com/cristina304/AssemMate.git

Problem

Research questions and friction points this paper is trying to address.

Improving real-time reasoning for robotic assembly via graph-based knowledge representation

Enabling precise assembly task planning through knowledge graph question answering

Addressing stacked scene challenges in robotic grasping with vision enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based LLM for assembly knowledge representation

Self-supervised GCN encodes knowledge graph entities

Vision-enhanced strategy handles stacked grasping scenes

🔎 Similar Papers

LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning