RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of constructing embedded intelligence for embodied robots deployed at the network edge—under conditions of intermittent cloud connectivity, and severe constraints on computational resources and memory. We propose the first closed-loop reinforcement learning (RL)-driven lightweight LLM training framework specifically designed for embodied AI, eliminating reliance on large-model distillation or supervised fine-tuning. Extending the R1-zero mathematical reasoning paradigm to physical interaction settings, our method enables small language models (e.g., Qwen2.5-1.5B/3B) to autonomously acquire embodied reasoning and decision-making capabilities via online environmental feedback. In autonomous driving tasks, Qwen2.5-1.5B achieves a 20.2-percentage-point accuracy gain over SFT baselines; Qwen2.5-3B attains a control adaptability of 63.3%, surpassing cloud-based GPT-4o (58.5%). These results empirically validate the feasibility and superiority of cloud-independent, resource-efficient embodied intelligence.

Technology Category

Application Category

📝 Abstract
Future robotic systems operating in real-world environments will require on-board embodied intelligence without continuous cloud connection, balancing capabilities with constraints on computational power and memory. This work presents an extension of the R1-zero approach, which enables the usage of low parameter-count Large Language Models (LLMs) in the robotic domain. The R1-Zero approach was originally developed to enable mathematical reasoning in LLMs using static datasets. We extend it to the robotics domain through integration in a closed-loop Reinforcement Learning (RL) framework. This extension enhances reasoning in Embodied Artificial Intelligence (Embodied AI) settings without relying solely on distillation of large models through Supervised Fine-Tuning (SFT). We show that small-scale LLMs can achieve effective reasoning performance by learning through closed-loop interaction with their environment, which enables tasks that previously required significantly larger models. In an autonomous driving setting, a performance gain of 20.2%-points over the SFT-based baseline is observed with a Qwen2.5-1.5B model. Using the proposed training procedure, Qwen2.5-3B achieves a 63.3% control adaptability score, surpassing the 58.5% obtained by the much larger, cloud-bound GPT-4o. These results highlight that practical, on-board deployment of small LLMs is not only feasible but can outperform larger models if trained through environmental feedback, underscoring the importance of an interactive learning framework for robotic Embodied AI, one grounded in practical experience rather than static supervision.
Problem

Research questions and friction points this paper is trying to address.

Enabling robotic intelligence without cloud reliance
Enhancing small LLMs via closed-loop reinforcement learning
Achieving superior performance with compact on-board models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop RL enhances small LLMs' robotic reasoning
Low-parameter LLMs outperform larger models via interaction
Environmental feedback training boosts autonomous driving performance
🔎 Similar Papers
No similar papers found.