ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the safe translation of natural language instructions into localized robotic control. Methodologically, it integrates retrieval-augmented generation (RAG) with RGB-D visual perception: task-relevant knowledge is retrieved from a vector database to guide a large language model in generating structured action sequences; end-to-end on-device perception (AprilTag detection and深度融合 localization) and safety-aware control (workspace, force, velocity constraints, and timeout-based retry) are implemented on the UFactory xArm manipulator. Key contributions include: (i) a maintainable RAG-powered robot knowledge base that significantly improves planning accuracy and task adaptability; (ii) a hardware-software co-designed safety gating mechanism ensuring execution robustness; and (iii) a decoupled architecture separating high-level reasoning from low-level control, enabling reproducible evaluation and real-world deployment. Experiments validate system effectiveness across desktop scanning, approach-and-grasp, and place tasks.

Technology Category

Application Category

📝 Abstract

We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.

Problem

Research questions and friction points this paper is trying to address.

Connects natural-language instructions to safe robotic control

Uses Retrieval-Augmented Generation for knowledge-driven autonomous manipulation

Improves plan validity and adaptability while maintaining local control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining Retrieval-Augmented Generation with RGB-D perception

Indexing robot knowledge in vector database for context retrieval

Enforcing execution via software safety gates and bounds

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey