Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the degradation in closed-loop reliability of vision-language-action (VLA) models during on-device deployment under environmental shifts. To tackle this challenge, the authors propose an online, success-memory-guided test-time adaptation framework that introduces, for the first time, a non-parametric memory mechanism grounded in successful experiences. Without updating model parameters, the method constructs environment-specific action priors by storing long-term successful trajectories, filtering them via trajectory consistency, and aggregating elite action priors. These priors then guide the frozen VLA policy in generating actions through confidence-adaptive modulation. Experimental results demonstrate that the proposed approach significantly improves both task success rates and closed-loop stability across long-horizon, multi-stage tasks in both simulated and real-world robotic platforms.

📝 Abstract

Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

test-time adaptation

success memory

closed-loop reliability

persistent deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation

success memory

retrieve-then-steer