Learning What Helps: Task-Aligned Context Selection for Vision Tasks

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Vision Transformers (ViTs) struggle to autonomously identify context examples genuinely beneficial for task prediction, often relying on superficial visual similarity for retrieval—leading to context selection misaligned with downstream objectives. To address this, we propose Task-Aligned Context Selection (TACS), the first framework that jointly optimizes example selection and downstream task performance. TACS introduces a differentiable selection network co-trained end-to-end with the ViT-based task model, employing a hybrid optimization strategy combining gradient-based supervision and reinforcement learning to explicitly train the selector to prioritize task-relevant over merely visually similar examples. Evaluated across 18 fine-grained recognition, medical image classification, and segmentation benchmarks, TACS consistently outperforms conventional similarity-based retrieval methods—particularly under data scarcity and high uncertainty—establishing a learnable, task-driven context selection mechanism.

Technology Category

Application Category

📝 Abstract

Humans often resolve visual uncertainty by comparing an image with relevant examples, but ViTs lack the ability to identify which examples would improve their predictions. We present Task-Aligned Context Selection (TACS), a framework that learns to select paired examples which truly improve task performance rather than those that merely appear similar. TACS jointly trains a selector network with the task model through a hybrid optimization scheme combining gradient-based supervision and reinforcement learning, making retrieval part of the learning objective. By aligning selection with task rewards, TACS enables discriminative models to discover which contextual examples genuinely help. Across 18 datasets covering fine-grained recognition, medical image classification, and medical image segmentation, TACS consistently outperforms similarity-based retrieval, particularly in challenging or data-limited settings.

Problem

Research questions and friction points this paper is trying to address.

Selects task-improving examples over similar ones

Trains selector jointly with task model via hybrid optimization

Enhances vision tasks across diverse datasets and settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns to select task-improving examples, not just similar ones

Trains selector network with hybrid gradient and reinforcement learning

Aligns example selection with task rewards for better performance

🔎 Similar Papers

Cropper: Vision-Language Model for Image Cropping through In-Context Learning