GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This work addresses the performance degradation of vision-language models under distribution shifts at test time, where ground-truth labels are unavailable. It introduces Group Relative Policy Optimization (GRPO) into test-time adaptation (TTA) for the first time, proposing a label-free, probability-driven optimization framework. The approach formulates class-relevant prompt prediction as a group policy optimization problem, constructing output groups by sampling the top-K classes based on CLIP similarity distributions. Alignment and dispersion reward functions are designed to guide the fine-tuning of the visual encoder without relying on true labels. Extensive experiments across multiple benchmarks demonstrate that the proposed method significantly outperforms existing TTA approaches, with particularly pronounced gains under natural distribution shifts.
📝 Abstract
Group Relative Policy Optimization (GRPO) has recently shown strong performance in post-training large language models and vision-language models. It raises a question of whether the GRPO also significantly promotes the test-time adaptation (TTA) of vision language models. In this paper, we propose Group Relative Policy Optimization for Test-Time Adaptation (GRPO-TTA), which adapts GRPO to the TTA setting by reformulating class-specific prompt prediction as a group-wise policy optimization problem. Specifically, we construct output groups by sampling top-K class candidates from CLIP similarity distributions, enabling probability-driven optimization without access to ground-truth labels. Moreover, we design reward functions tailored to test-time adaptation, including alignment rewards and dispersion rewards, to guide effective visual encoder tuning. Extensive experiments across diverse benchmarks demonstrate that GRPO-TTA consistently outperforms existing test-time adaptation methods, with notably larger performance gains under natural distribution shifts.
Problem

Research questions and friction points this paper is trying to address.

Test-Time Adaptation
Vision-Language Models
Distribution Shift
Label-Free Adaptation
Visual Tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Adaptation
Vision-Language Models
Group Relative Policy Optimization
Prompt Tuning
Reinforcement Learning
🔎 Similar Papers