🤖 AI Summary
Traditional robotic approaches are limited in generalization, training efficiency, and interpretability, hindering their ability to continuously self-adapt through environmental feedback. This work proposes the Evolvable Embodied Agent (EEAgent) framework, which integrates large vision-language models (VLMs) for environmental perception and task planning, and introduces a Long- and Short-Term Reflection Optimization (LSTRO) mechanism. LSTRO dynamically fuses historical experiences with newly acquired knowledge to iteratively refine prompting strategies, enabling continual self-evolution of the agent. Evaluated on six tasks in the VIMA-Bench benchmark, the proposed method achieves a new state-of-the-art performance and significantly outperforms existing baselines in complex scenarios, demonstrating its effectiveness and advancement in enabling self-evolving embodied intelligence.
📝 Abstract
Achieving general-purpose robotics requires empowering robots to adapt and evolve based on their environment and feedback. Traditional methods face limitations such as extensive training requirements, difficulties in cross-task generalization, and lack of interpretability. Prompt learning offers new opportunities for self-evolving robots without extensive training, but simply reflecting on past experiences.However, extracting meaningful insights from task successes and failures remains a challenge. To this end, we propose the evolvable embodied agent (EEAgent) framework, which leverages large vision-language models (VLMs) for better environmental interpretation and policy planning. To enhance reflection on past experiences, we propose a long short-term reflective optimization (LSTRO) mechanism that dynamically refines prompts based on both past experiences and newly learned lessons, facilitating continuous self-evolution, thereby enhancing overall task success rates. Evaluations on six VIMA-Bench tasks reveal that our approach sets a new state-of-the-art, notably outperforming baselines in complex scenarios.