TAPS : Frustratingly Simple Test Time Active Learning for VLMs

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of real-time adaptation of vision-language models (VLMs) to continuous data streams under stringent constraints of single-sample latency and limited memory. Methodologically, it introduces the first test-time active learning framework for streaming VLMs, integrating active sample selection with prompt optimization: high-uncertainty samples are dynamically selected via entropy-based thresholds for oracle querying, while a class-balanced memory buffer and class-aware distribution alignment enable efficient online model updates—requiring only a single gradient step, without full retraining. Its key contribution is enabling low-latency, memory-efficient test-time adaptation in the challenging single-sample streaming setting. Extensive experiments across 10 cross-dataset transfer and 4 domain generalization tasks demonstrate consistent superiority over state-of-the-art methods, achieving an optimal trade-off between accuracy gains and deployment efficiency.

Technology Category

Application Category

📝 Abstract

Test-Time Optimization enables models to adapt to new data during inference by updating parameters on-the-fly. Recent advances in Vision-Language Models (VLMs) have explored learning prompts at test time to improve performance in downstream tasks. In this work, we extend this idea by addressing a more general and practical challenge: Can we effectively utilize an oracle in a continuous data stream where only one sample is available at a time, requiring an immediate query decision while respecting latency and memory constraints? To tackle this, we propose a novel Test-Time Active Learning (TTAL) framework that adaptively queries uncertain samples and updates prompts dynamically. Unlike prior methods that assume batched data or multiple gradient updates, our approach operates in a real-time streaming scenario with a single test sample per step. We introduce a dynamically adjusted entropy threshold for active querying, a class-balanced replacement strategy for memory efficiency, and a class-aware distribution alignment technique to enhance adaptation. The design choices are justified using careful theoretical analysis. Extensive experiments across 10 cross-dataset transfer benchmarks and 4 domain generalization datasets demonstrate consistent improvements over state-of-the-art methods while maintaining reasonable latency and memory overhead. Our framework provides a practical and effective solution for real-world deployment in safety-critical applications such as autonomous systems and medical diagnostics.

Problem

Research questions and friction points this paper is trying to address.

Active learning for VLMs in real-time streaming data

Adaptive querying of uncertain samples under constraints

Dynamic prompt updates with single-sample optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Active Learning framework

Dynamic entropy threshold for querying

Class-balanced memory replacement strategy

🔎 Similar Papers

Active Testing of Large Language Model via Multi-Stage Sampling