On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

📅 2026-01-11
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes TT-VLA, a novel framework that introduces test-time reinforcement learning to vision-language-action (VLA) models, enabling online adaptation during deployment without requiring retraining. Existing VLA models lack the ability to adapt at test time, limiting their robustness in dynamic environments. TT-VLA addresses this by fine-tuning policies at inference using task-progress signals, integrating a dense reward mechanism with prior-preserving techniques to maintain stable and effective behavior. Experiments demonstrate that the method significantly improves task success rates, policy stability, and adaptability to unseen environmental dynamics in both simulated and real-world settings, thereby enhancing the practical deployability of VLA models.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular, they are primarily trained via supervised fine-tuning or training-time reinforcement learning, requiring explicit fine-tuning phases, human interventions, or controlled data collection. Consequently, existing methods remain unsuitable for challenging simulated- or physical-world deployments, where robots must respond autonomously and flexibly to evolving environments. To address this limitation, we introduce a Test-Time Reinforcement Learning for VLAs (TT-VLA), a framework that enables on-the-fly policy adaptation during inference. TT-VLA formulates a dense reward mechanism that leverages step-by-step task-progress signals to refine action policies during test time while preserving the SFT/RL-trained priors, making it an effective supplement to current VLA models. Empirical results show that our approach enhances overall adaptability, stability, and task success in dynamic, previously unseen scenarios under simulated and real-world settings. We believe TT-VLA offers a principled step toward self-improving, deployment-ready VLAs.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
test-time adaptation
reinforcement learning
autonomous robot learning
dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Reinforcement Learning
Vision-Language-Action Models
On-the-Fly Adaptation
Dense Reward
Deployment-Ready Robotics
🔎 Similar Papers
No similar papers found.