Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing Vision-Language-Action (VLA) models often exhibit brittleness under minor environmental shifts due to trajectory overfitting. This work proposes the PDF framework, which achieves validator-free test-time adaptation for the first time by integrating uncertainty-driven data augmentation, action voting, and a lightweight perturbation module. A novel delayed feedback mechanism is introduced to recalibrate action confidence, complemented by an adaptive augmentation scheduler to mitigate spurious correlations in multimodal agents. Evaluated on the LIBERO benchmark, the approach improves task success rates by 7.4% and achieves a 10.3% gain in human-normalized scores on Atari environments, significantly outperforming both the original VLA model and current test-time adaptation methods.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback, correcting overconfidence issue. Experiments on LIBERO (+7.4\% success rate) and Atari (+10.3 human normalized score) demonstrate consistent gains of PDF in task success over vanilla VLA and VLA with test-time adaptation, establishing a practical path toward reliable test-time adaptation in multimodal decision-making agents. The code is available at \href{https://github.com/zhoujiahuan1991/CVPR2026-PDF}{https://github.com/zhoujiahuan1991/CVPR2026-PDF}.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

trajectory overfitting

spurious correlation

environmental shifts

test-time adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Adaptation

Vision-Language-Action Models

Delayed Feedback