PANICL: Mitigating Over-Reliance on Single Prompt in Visual In-Context Learning

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual in-context learning (VICL) suffers from prediction bias and instability due to its reliance on a single context example. To address this, we propose PANICL—a training-free, general-purpose visual in-context learning framework. Our method introduces image-patch-level k-nearest neighbor retrieval to enable multi-example similarity matching, and integrates dynamic weighting with feature-space alignment to smoothly fuse prediction scores across multiple context examples. This design effectively mitigates single-example bias, substantially improving prediction stability and cross-task and cross-model generalization. Extensive experiments demonstrate that PANICL consistently outperforms strong baselines across diverse vision tasks—including classification, segmentation, and detection—while maintaining robustness under dataset shift and label-space changes. These results validate its generality, scalability, and practical applicability in real-world vision systems.

Technology Category

Application Category

📝 Abstract
Visual In-Context Learning (VICL) uses input-output image pairs, referred to as in-context pairs (or examples), as prompts alongside query images to guide models in performing diverse vision tasks. However, VICL often suffers from over-reliance on a single in-context pair, which can lead to biased and unstable predictions. We introduce PAtch-based $k$-Nearest neighbor visual In-Context Learning (PANICL), a general training-free framework that mitigates this issue by leveraging multiple in-context pairs. PANICL smooths assignment scores across pairs, reducing bias without requiring additional training. Extensive experiments on a variety of tasks, including foreground segmentation, single object detection, colorization, multi-object segmentation, and keypoint detection, demonstrate consistent improvements over strong baselines. Moreover, PANICL exhibits strong robustness to domain shifts, including dataset-level shift (e.g., from COCO to Pascal) and label-space shift (e.g., FSS-1000), and generalizes well to other VICL models such as SegGPT, Painter, and LVM, highlighting its versatility and broad applicability.
Problem

Research questions and friction points this paper is trying to address.

Mitigating over-reliance on single in-context pairs
Reducing biased and unstable predictions in VICL
Improving robustness to domain shifts across tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages multiple in-context pairs for predictions
Smooths assignment scores to reduce bias
Training-free framework for visual tasks