Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

πŸ“… 2026-05-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

212K/year
πŸ€– AI Summary
Existing vision-language-action (VLA) models suffer from limited generalization and heavy reliance on large-scale demonstration data. This work proposes an agent-based online adaptation framework that enables efficient learning and transfer through dynamic reward synthesis, language-guided systematic exploration, and a task-aware episodic memory mechanism. The approach integrates reward shaping, large language model–driven exploration strategies, and a retrievable policy weight memory to support curriculum learning and cross-task transfer. Evaluated on the LIBERO benchmark, the method improves long-horizon task success rates by 12.3%, boosts one-shot learning performance by 28.5%, elevates cross-task transfer from 0% to 31.2%, and achieves 2.4Γ— faster convergence, consistently outperforming prior methods on the RoboTwin 2.0 Hard setting.
πŸ“ Abstract
Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensive demonstrations. We introduce Agentic-VLA, an agentic training framework that enables VLAs to efficiently adapt online through three key innovations: (1) Adaptive Reward Synthesis, which dynamically generates and adjusts reward functions based on the VLA's current capabilities and task complexity, decomposing complex tasks into learnable sub-goals for curriculum learning; (2) Language-Guided Exploration, where a critic model provides structured guidance for systematic exploration rather than random sampling; and (3) Experience Memory,which stores and retrieves task-relevant policy weights for warm-starting adaptation to similar tasks. We evaluate Agentic-VLA on the LIBERO benchmark, achieving substantial improvements: +12.3% on long-horizon tasks, +28.5% in 1-shot learning, and enabling cross-task transfer from 0% to 31.2% without task-specific demonstrations. Our framework also demonstrates 2.4x faster convergence compared to existing online adaptation methods. Beyond LIBERO, Agentic-VLA retains its advantage on the dual-arm RoboTwin 2.0 benchmark, including under its randomized Hard setting. These results establish Agentic-VLA as a significant step toward truly adaptive VLA systems capable of continuous learning in deployment.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
generalization
training efficiency
robotic manipulation
online adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic-VLA
Adaptive Reward Synthesis
Language-Guided Exploration
Experience Memory
Online Adaptation