π€ AI Summary
This work addresses the safe translation of natural language instructions into localized robotic control. Methodologically, it integrates retrieval-augmented generation (RAG) with RGB-D visual perception: task-relevant knowledge is retrieved from a vector database to guide a large language model in generating structured action sequences; end-to-end on-device perception (AprilTag detection andζ·±εΊ¦θε localization) and safety-aware control (workspace, force, velocity constraints, and timeout-based retry) are implemented on the UFactory xArm manipulator. Key contributions include: (i) a maintainable RAG-powered robot knowledge base that significantly improves planning accuracy and task adaptability; (ii) a hardware-software co-designed safety gating mechanism ensuring execution robustness; and (iii) a decoupled architecture separating high-level reasoning from low-level control, enabling reproducible evaluation and real-world deployment. Experiments validate system effectiveness across desktop scanning, approach-and-grasp, and place tasks.
π Abstract
We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.