SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Robotic policies often suffer from insufficient exploration due to action-mode collapse, and existing random-perturbation-based exploration methods exhibit poor safety and behavioral instability. Method: We propose Manifold-Constrained Exploration (MCE), a framework that learns a compact action manifold representation in a task-relevant latent space and strictly constrains exploration to a safe, feasible submanifold—thereby jointly ensuring safety and diversity. MCE integrates implicit manifold modeling, policy augmentation, and a plug-and-play exploration module, enabling seamless integration with arbitrary policy architectures and supporting human-in-the-loop guidance. Contribution/Results: Evaluated in both simulation and real-robot tasks, MCE significantly improves task success rates and sample efficiency while yielding smoother, more controllable exploration behavior. It consistently outperforms state-of-the-art exploration baselines across diverse benchmarks.

Technology Category

Application Category

📝 Abstract

Intelligent agents progress by continually refining their capabilities through actively exploring environments. Yet robot policies often lack sufficient exploration capability due to action mode collapse. Existing methods that encourage exploration typically rely on random perturbations, which are unsafe and induce unstable, erratic behaviors, thereby limiting their effectiveness. We propose Self-Improvement via On-Manifold Exploration (SOE), a framework that enhances policy exploration and improvement in robotic manipulation. SOE learns a compact latent representation of task-relevant factors and constrains exploration to the manifold of valid actions, ensuring safety, diversity, and effectiveness. It can be seamlessly integrated with arbitrary policy models as a plug-in module, augmenting exploration without degrading the base policy performance. Moreover, the structured latent space enables human-guided exploration, further improving efficiency and controllability. Extensive experiments in both simulation and real-world tasks demonstrate that SOE consistently outperforms prior methods, achieving higher task success rates, smoother and safer exploration, and superior sample efficiency. These results establish on-manifold exploration as a principled approach to sample-efficient policy self-improvement. Project website: https://ericjin2002.github.io/SOE

Problem

Research questions and friction points this paper is trying to address.

Robot policies lack exploration due to action mode collapse

Random perturbation methods cause unsafe and unstable behaviors

Need for safe, diverse policy exploration in robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns compact latent task representation

Constrains exploration to safe action manifold

Plug-in module compatible with arbitrary policies

🔎 Similar Papers

No similar papers found.