Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in vision-language-action (VLA) models during continual reinforcement learning. To mitigate this issue, the authors propose a simple yet effective approach that combines sequential fine-tuning with low-rank adaptation (LoRA), applied within an on-policy reinforcement learning framework to large pretrained VLA models. Extensive experiments across three prominent VLA architectures and five lifelong reinforcement learning benchmarks demonstrate that the method significantly alleviates forgetting and maintains high plasticity without relying on complex continual learning mechanisms. Moreover, it achieves strong zero-shot generalization performance, often surpassing existing, more intricate continual reinforcement learning methods. The results highlight the approach’s remarkable stability, scalability, and practical utility in real-world continual learning scenarios.

Technology Category

Application Category

📝 Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

Problem

Research questions and friction points this paper is trying to address.

Continual Reinforcement Learning

Vision-Language-Action Models

Catastrophic Forgetting

Lifelong Learning

Sequential Fine-Tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Reinforcement Learning

Vision-Language-Action Models

Sequential Fine-Tuning