Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
This work addresses the degradation in closed-loop reliability of vision-language-action (VLA) models during on-device deployment under environmental shifts. To tackle this challenge, the authors propose an online, success-memory-guided test-time adaptation framework that introduces, for the first time, a non-parametric memory mechanism grounded in successful experiences. Without updating model parameters, the method constructs environment-specific action priors by storing long-term successful trajectories, filtering them via trajectory consistency, and aggregating elite action priors. These priors then guide the frozen VLA policy in generating actions through confidence-adaptive modulation. Experimental results demonstrate that the proposed approach significantly improves both task success rates and closed-loop stability across long-horizon, multi-stage tasks in both simulated and real-world robotic platforms.
📝 Abstract
Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
test-time adaptation
success memory
closed-loop reliability
persistent deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation
success memory
retrieve-then-steer
flow-matching guidance
non-parametric adaptation
🔎 Similar Papers
No similar papers found.
J
Jianchao Zhao
College of Artificial Intelligence, Xi’an Jiaotong University
H
Huoren Yang
College of Artificial Intelligence, Xi’an Jiaotong University
H
Hu Yusong
One Robotics
Yuyang Gao
Yuyang Gao
Computer Science, Emory University
Data MiningMachine LearningDeep Learning
Q
Qiguan Ou
One Robotics
Cong Wan
Cong Wan
Xian Jiaotong University
AIGC3Ddiffusion
S
SongLin Dong
Shenzhen University of Advanced Technology
Z
Zhiheng Ma
Shenzhen University of Advanced Technology
Yihong Gong
Yihong Gong
Xi'an Jiaotong University
Multimedia content analysisMachine learningPattern recognition