AssemMate: Graph-Based LLM for Robotic Assembly Assistance

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-driven robotic assembly systems suffer from verbose textual knowledge representations, resulting in inefficient reasoning, poor real-time performance, and excessive context length. To address these limitations, this paper proposes a graph-augmented large language model framework: (1) it introduces knowledge graphs (KGs) into LLM-based assembly systems for structured, semantically rich assembly knowledge representation; (2) it designs a self-supervised graph convolutional network (GCN) to achieve cross-modal alignment between graph-structured knowledge and natural language, integrated with vision-enhanced perception for robust handling of complex occluded and stacked scenes and precise grasping. The framework unifies natural language interaction, task planning, and low-level action execution. Experiments in both simulation and real-world settings demonstrate a 6.4% improvement in assembly accuracy, a threefold increase in inference speed, a 28× reduction in context length, and strong generalization across unseen tasks and object configurations.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based robotic assembly assistance has gained significant research attention. It requires the injection of domain-specific knowledge to guide the assembly process through natural language interaction with humans. Despite some progress, existing methods represent knowledge in the form of natural language text. Due to the long context and redundant content, they struggle to meet the robots' requirements for real-time and precise reasoning. In order to bridge this gap, we present AssemMate, which utilizes the graph extemdash a concise and accurate form of knowledge representation extemdash as input. This graph-based LLM enables knowledge graph question answering (KGQA), supporting human-robot interaction and assembly task planning for specific products. Beyond interactive QA, AssemMate also supports sensing stacked scenes and executing grasping to assist with assembly. Specifically, a self-supervised Graph Convolutional Network (GCN) encodes knowledge graph entities and relations into a latent space and aligns them with LLM's representation, enabling the LLM to understand graph information. In addition, a vision-enhanced strategy is employed to address stacked scenes in grasping. Through training and evaluation, AssemMate outperforms existing methods, achieving 6.4% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs. And our approach further demonstrates superiority through robotic grasping experiments in both simulated and real-world settings. More details can be found on the project page: https://github.com/cristina304/AssemMate.git
Problem

Research questions and friction points this paper is trying to address.

Improving real-time reasoning for robotic assembly via graph-based knowledge representation
Enabling precise assembly task planning through knowledge graph question answering
Addressing stacked scene challenges in robotic grasping with vision enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based LLM for assembly knowledge representation
Self-supervised GCN encodes knowledge graph entities
Vision-enhanced strategy handles stacked grasping scenes
🔎 Similar Papers
No similar papers found.
Q
Qi Zheng
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
C
Chaoran Zhang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Z
Zijian Liang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
E
EnTe Lin
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
S
Shubo Cui
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Qinghongbing Xie
Qinghongbing Xie
Tsinghua University
MLLMEmbodied AIScene Understanding
Z
Zhaobo Xu
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
L
Long Zeng
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China