RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the limitations of existing vision–language–action (VLA) systems in long-horizon robotic tasks, which suffer from fragmented data collection, policy learning, and deployment pipelines, heavy reliance on manual resets, and fragile multi-policy execution. To overcome these challenges, the authors propose RoboClaw, a framework that unifies perception, decision-making, and control within a single vision–language model (VLM). RoboClaw introduces the novel entangled action pair (EAP) mechanism, enabling a self-resetting loop for continuous online policy refinement and end-to-end semantically consistent task execution. Experiments on physical robots demonstrate that this approach improves long-horizon task success rates by 25%, reduces human labor by 53.7%, and significantly enhances system robustness, stability, and scalability.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulation. However, scaling them to long-horizon tasks remains challenging. Existing pipelines typically separate data collection, policy learning, and deployment, resulting in heavy reliance on manual environment resets and brittle multi-policy execution. We present RoboClaw, an agentic robotics framework that unifies data collection, policy learning, and task execution under a single VLM-driven controller. At the policy level, RoboClaw introduces Entangled Action Pairs (EAP), which couple forward manipulation behaviors with inverse recovery actions to form self-resetting loops for autonomous data collection. This mechanism enables continuous on-policy data acquisition and iterative policy refinement with minimal human intervention. During deployment, the same agent performs high-level reasoning and dynamically orchestrates learned policy primitives to accomplish long-horizon tasks. By maintaining consistent contextual semantics across collection and execution, RoboClaw reduces mismatch between the two phases and improves multi-policy robustness. Experiments in real-world manipulation tasks demonstrate improved stability and scalability compared to conventional open-loop pipelines, while significantly reducing human effort throughout the robot lifecycle, achieving a 25% improvement in success rate over baseline methods on long-horizon tasks and reducing human time investment by 53.7%.

Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks

robotic manipulation

Vision-Language-Action systems

policy learning

autonomous data collection

Innovation

Methods, ideas, or system contributions that make the work stand out.

RoboClaw

Entangled Action Pairs

Vision-Language-Action

self-resetting loops

long-horizon robotic tasks

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

2024-07-14arXiv.orgCitations: 1

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15