SIME: Enhancing Policy Self-Improvement with Modal-level Exploration

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Robot policy self-improvement often suffers from capability stagnation, hindering the generation of high-value novel interaction data. Method: We propose a modality-level exploration mechanism that stimulates diverse interactions via multimodal behavioral sampling, coupled with uncertainty-aware data filtering and policy-gradient-driven online policy updates—forming a closed-loop pipeline of “demonstration → exploration → filtering → evolution.” Contribution/Results: This work is the first to integrate modality-level exploration with dynamic data selection, overcoming fundamental bottlenecks in knowledge generation inherent to conventional approaches. Evaluated on both simulation and real-robot platforms, our method achieves over 40% improvement in task success rate, reduces training cost by approximately 35%, and significantly enhances policy generalization and autonomous evolutionary capability.

Technology Category

Application Category

📝 Abstract

Self-improvement requires robotic systems to initially learn from human-provided data and then gradually enhance their capabilities through interaction with the environment. This is similar to how humans improve their skills through continuous practice. However, achieving effective self-improvement is challenging, primarily because robots tend to repeat their existing abilities during interactions, often failing to generate new, valuable data for learning. In this paper, we identify the key to successful self-improvement: modal-level exploration and data selection. By incorporating a modal-level exploration mechanism during policy execution, the robot can produce more diverse and multi-modal interactions. At the same time, we select the most valuable trials and high-quality segments from these interactions for learning. We successfully demonstrate effective robot self-improvement on both simulation benchmarks and real-world experiments. The capability for self-improvement will enable us to develop more robust and high-success-rate robotic control strategies at a lower cost. Our code and experiment scripts are available at https://ericjin2002.github.io/SIME/

Problem

Research questions and friction points this paper is trying to address.

Enhancing robotic self-improvement through modal-level exploration

Overcoming repetition in robot interactions to generate valuable data

Selecting high-quality interaction segments for effective learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modal-level exploration for diverse interactions

Selecting valuable trials for learning

Enhancing policy self-improvement effectively

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL