CRL-VLA: Continual Vision-Language-Action Learning

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing stability and plasticity in embodied agents performing continual learning in open-world environments. To this end, the authors propose a continual reinforcement learning framework tailored for vision–language–action models. The approach employs an asymmetric regulation mechanism: it constrains the magnitude of the advantage function on previously learned tasks to mitigate catastrophic forgetting, while permitting controlled policy updates on new tasks to facilitate adaptation. This is further enhanced by a dual-critic architecture, a goal-conditioned value function (GCVF), and a policy divergence constraint, collectively ensuring semantic consistency and learning flexibility. Evaluated on the LIBERO benchmark, the method demonstrates significant improvements over existing approaches in both resistance to forgetting and forward transfer capability.

Technology Category

Application Category

📝 Abstract
Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation through environmental interaction. Thus, Continual Reinforcement Learning (CRL) is a promising pathway for deploying VLA models in lifelong robotic scenarios, yet balancing stability (retaining old skills) and plasticity (learning new ones) remains a formidable challenge for existing methods. We introduce CRL-VLA, a framework for continual post-training of VLA models with rigorous theoretical bounds. We derive a unified performance bound linking the stability-plasticity trade-off to goal-conditioned advantage magnitude, scaled by policy divergence. CRL-VLA resolves this dilemma via asymmetric regulation: constraining advantage magnitudes on prior tasks while enabling controlled growth on new tasks. This is realized through a simple but effective dual-critic architecture with novel Goal-Conditioned Value Formulation (GCVF), where a frozen critic anchors semantic consistency and a trainable estimator drives adaptation. Experiments on the LIBERO benchmark demonstrate that CRL-VLA effectively harmonizes these conflicting objectives, outperforming baselines in both anti-forgetting and forward adaptation.
Problem

Research questions and friction points this paper is trying to address.

Continual Reinforcement Learning
Vision-Language-Action
stability-plasticity trade-off
lifelong learning
embodied agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Reinforcement Learning
Vision-Language-Action
Stability-Plasticity Trade-off
Dual-Critic Architecture
Goal-Conditioned Value Formulation
🔎 Similar Papers
Q
Qixin Zeng
University of Southampton, UK
S
Shuo Zhang
Westlake University, China
H
Hongyin Zhang
Westlake University, China
R
Renjie Wang
Westlake University, China
Han Zhao
Han Zhao
Zhejiang University | Westlake University
Embodied intelligenceReinforcement learningMultimodal large language modelsControl theory
L
Libang Zhao
Westlake University, China
R
Runze Li
Westlake University, China
Donglin Wang
Donglin Wang
Westlake University
Deep Reinforcement LearningMeta LearningRobot Learning
Chao Huang
Chao Huang
Associate Professor at University of Southampton, UK
Reinforcement LearningFormal MethodsRobotics