SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robotic policies often suffer from insufficient exploration due to action-mode collapse, and existing random-perturbation-based exploration methods exhibit poor safety and behavioral instability. Method: We propose Manifold-Constrained Exploration (MCE), a framework that learns a compact action manifold representation in a task-relevant latent space and strictly constrains exploration to a safe, feasible submanifold—thereby jointly ensuring safety and diversity. MCE integrates implicit manifold modeling, policy augmentation, and a plug-and-play exploration module, enabling seamless integration with arbitrary policy architectures and supporting human-in-the-loop guidance. Contribution/Results: Evaluated in both simulation and real-robot tasks, MCE significantly improves task success rates and sample efficiency while yielding smoother, more controllable exploration behavior. It consistently outperforms state-of-the-art exploration baselines across diverse benchmarks.

Technology Category

Application Category

📝 Abstract
Intelligent agents progress by continually refining their capabilities through actively exploring environments. Yet robot policies often lack sufficient exploration capability due to action mode collapse. Existing methods that encourage exploration typically rely on random perturbations, which are unsafe and induce unstable, erratic behaviors, thereby limiting their effectiveness. We propose Self-Improvement via On-Manifold Exploration (SOE), a framework that enhances policy exploration and improvement in robotic manipulation. SOE learns a compact latent representation of task-relevant factors and constrains exploration to the manifold of valid actions, ensuring safety, diversity, and effectiveness. It can be seamlessly integrated with arbitrary policy models as a plug-in module, augmenting exploration without degrading the base policy performance. Moreover, the structured latent space enables human-guided exploration, further improving efficiency and controllability. Extensive experiments in both simulation and real-world tasks demonstrate that SOE consistently outperforms prior methods, achieving higher task success rates, smoother and safer exploration, and superior sample efficiency. These results establish on-manifold exploration as a principled approach to sample-efficient policy self-improvement. Project website: https://ericjin2002.github.io/SOE
Problem

Research questions and friction points this paper is trying to address.

Robot policies lack exploration due to action mode collapse
Random perturbation methods cause unsafe and unstable behaviors
Need for safe, diverse policy exploration in robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns compact latent task representation
Constrains exploration to safe action manifold
Plug-in module compatible with arbitrary policies
🔎 Similar Papers
No similar papers found.