🤖 AI Summary
This work addresses the degradation in closed-loop reliability of vision-language-action (VLA) models during on-device deployment under environmental shifts. To tackle this challenge, the authors propose an online, success-memory-guided test-time adaptation framework that introduces, for the first time, a non-parametric memory mechanism grounded in successful experiences. Without updating model parameters, the method constructs environment-specific action priors by storing long-term successful trajectories, filtering them via trajectory consistency, and aggregating elite action priors. These priors then guide the frozen VLA policy in generating actions through confidence-adaptive modulation. Experimental results demonstrate that the proposed approach significantly improves both task success rates and closed-loop stability across long-horizon, multi-stage tasks in both simulated and real-world robotic platforms.
📝 Abstract
Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.