ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing robots face key challenges in lifelong learning: slow task adaptation, planning lacking geometric and physical grounding, and unstable outputs from large language models (LLMs) or vision-language models (VLMs). To address these, we propose a vision-grounded replanning framework integrated with a reusable skill memory module. Our approach leverages VLMs to establish visual grounding of scene geometry and object physical properties, while employing LLMs for knowledge-informed high-level planning. A state-feedback-driven dynamic replanning mechanism enables robust recovery from execution failures, and a skill memory module consolidates successful experiences for cross-task transfer. Evaluated on LIBERO, RLBench, and real robotic platforms, our method significantly improves task success rates and generalization over state-of-the-art baselines. It establishes a reliable, adaptive, and sustainable closed-loop autonomous learning system for robots.

Technology Category

Application Category

📝 Abstract
Robots trained via Reinforcement Learning (RL) or Imitation Learning (IL) often adapt slowly to new tasks, whereas recent Large Language Models (LLMs) and Vision-Language Models (VLMs) promise knowledge-rich planning from minimal data. Deploying LLMs/VLMs for motion planning, however, faces two key obstacles: (i) symbolic plans are rarely grounded in scene geometry and object physics, and (ii) model outputs can vary for identical prompts, undermining execution reliability. We propose ViReSkill, a framework that pairs vision-grounded replanning with a skill memory for accumulation and reuse. When a failure occurs, the replanner generates a new action sequence conditioned on the current scene, tailored to the observed state. On success, the executed plan is stored as a reusable skill and replayed in future encounters without additional calls to LLMs/VLMs. This feedback loop enables autonomous continual learning: each attempt immediately expands the skill set and stabilizes subsequent executions. We evaluate ViReSkill on simulators such as LIBERO and RLBench as well as on a physical robot. Across all settings, it consistently outperforms conventional baselines in task success rate, demonstrating robust sim-to-real generalization.
Problem

Research questions and friction points this paper is trying to address.

Slow robot adaptation to new tasks in lifelong learning scenarios
Unreliable symbolic planning due to ungrounded geometry and physics
Inconsistent model outputs undermining execution reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-grounded replanning for scene adaptation
Skill memory for plan accumulation and reuse
Autonomous continual learning via feedback loop