ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation

πŸ“… 2025-10-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the safe translation of natural language instructions into localized robotic control. Methodologically, it integrates retrieval-augmented generation (RAG) with RGB-D visual perception: task-relevant knowledge is retrieved from a vector database to guide a large language model in generating structured action sequences; end-to-end on-device perception (AprilTag detection and深度融合 localization) and safety-aware control (workspace, force, velocity constraints, and timeout-based retry) are implemented on the UFactory xArm manipulator. Key contributions include: (i) a maintainable RAG-powered robot knowledge base that significantly improves planning accuracy and task adaptability; (ii) a hardware-software co-designed safety gating mechanism ensuring execution robustness; and (iii) a decoupled architecture separating high-level reasoning from low-level control, enabling reproducible evaluation and real-world deployment. Experiments validate system effectiveness across desktop scanning, approach-and-grasp, and place tasks.

Technology Category

Application Category

πŸ“ Abstract
We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.
Problem

Research questions and friction points this paper is trying to address.

Connects natural-language instructions to safe robotic control
Uses Retrieval-Augmented Generation for knowledge-driven autonomous manipulation
Improves plan validity and adaptability while maintaining local control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining Retrieval-Augmented Generation with RGB-D perception
Indexing robot knowledge in vector database for context retrieval
Enforcing execution via software safety gates and bounds
πŸ”Ž Similar Papers
No similar papers found.
E
Eugene Vorobiov
Department of Mechanical, Industrial and Mechatronics Engineering, Toronto Metropolitan University, Toronto, Canada
A
Ammar Jaleel Mahmood
Department of Mechanical, Industrial and Mechatronics Engineering, Toronto Metropolitan University, Toronto, Canada
S
Salim Rezvani
Department of Mechanical, Industrial and Mechatronics Engineering, Toronto Metropolitan University, Toronto, Canada
Robin Chhabra
Robin Chhabra
Professor of Robotics & Mechatronics, Toronto Metropolitan University
Soft RoboticsEmbodied AIMulti-Robot SystemsRobotic Self PerceptionGeometric Mechanics