RISE: Self-Improving Robot Policy with Compositional World Model

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing vision-language-action models often fail in dynamic, contact-intensive robotic manipulation due to minor deviations, while real-world online reinforcement learning is hindered by safety concerns, high costs, and environment reset challenges. To address these limitations, this work proposes RISE—a framework that enables closed-loop self-improvement entirely in imagination through a composable world model. RISE leverages a controllable multi-view dynamics model to predict future states and employs a disentangled progress-value model to evaluate outcomes and generate advantage signals, thereby driving policy gradient updates without any physical interaction. Evaluated on three real-world tasks—dynamic block sorting, backpack packing, and box sealing—the method substantially outperforms existing approaches, achieving performance gains of 35%, 45%, and 35%, respectively.

Technology Category

Application Category

📝 Abstract

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

contact-rich manipulation

dynamic manipulation tasks

on-policy reinforcement learning

physical world constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional World Model

Imaginative Reinforcement Learning

Vision-Language-Action Models

Self-Improving Policy

Model-Based Robotics

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

2024-05-22arXiv.orgCitations: 1

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15