CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

In vision-language prompting, freezing the visual encoder often leads to feature misalignment and inter-class confusion. To address this, we propose a unified framework that balances task specialization with cross-domain generalization. Our method introduces two key innovations: (1) Confusion-Aware Loss (CoA-loss), which explicitly optimizes decision boundaries to mitigate class confusion; and (2) Confidence-Aware Mixture (CoA-weights), the first theoretically grounded mechanism for dynamically aligning prompt embeddings with frozen encoder outputs. Crucially, our approach requires no fine-tuning of the visual encoder, preserving its pre-trained semantics while enhancing adaptability. Extensive experiments demonstrate significant improvements in multi-task performance and cross-domain generalization, achieving state-of-the-art results on multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Prompt tuning, which adapts vision-language models by freezing model parameters and optimizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA-weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix.

Problem

Research questions and friction points this paper is trying to address.

Improving specialization in prompt tuning by refining decision boundaries

Enhancing generalization without compromising specialization using mixture models

Addressing misaligned features from frozen encoders causing class confusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Confusion-aware loss refines decision boundaries

Mixture model enhances generalization without specialization loss

Confidence-aware weights adjust prediction weights dynamically

🔎 Similar Papers

Tyche: Stochastic in-Context Learning for Medical Image Segmentation