Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the degraded discriminative capability of vision-language models on tail classes during prompt tuning under class-imbalanced data. To mitigate this issue, the authors propose a cluster-aware neural collapse prompt tuning method that constructs a cluster-invariant space via semantic clustering and jointly optimizes intra-cluster geometric structure through three tailored losses: a textual ETF separation loss, an intra-class convergence loss, and a rotation stability loss. This approach enhances inter-class separation and intra-class alignment while preserving the pretrained semantic structure. Extensive experiments across eleven diverse datasets demonstrate that the proposed method significantly improves long-tailed recognition performance, outperforming state-of-the-art approaches and exhibiting strong generalization to unseen classes.

📝 Abstract

Prompt learning has emerged as an efficient alternative to fine-tuning pre-trained vision-language models (VLMs). Despite its promise, current methods still struggle to maintain tail-class discriminability when adapting to class-imbalanced datasets. In this work, we propose cluster-aware neural collapse prompt tuning (CPT), which enhances the discriminability of tail classes in prompt-tuned VLMs without sacrificing their overall generalization. First, we design a cluster-invariant space by mining semantic assignments from the pre-trained VLM and mapping them to prompt-tuned features. This computes cluster-level boundaries and restricts the constraints to local neighborhoods, which reduces interference with the global semantic structure of the pre-trained VLM. Second, we introduce neural-collapse-driven discriminability optimization with three losses: textual Equiangular Tight Frame (ETF) separation loss, class-wise convergence loss, and rotation stabilization loss. These losses work together to shape intra-cluster geometry for better inter-class separation and intra-class alignment. Extensive experiments on 11 diverse datasets demonstrate that CPT outperforms SOTA methods, with stronger performance on long-tail classes and good generalization to unseen classes.

Problem

Research questions and friction points this paper is trying to address.

long-tailed generalization

vision-language models

prompt tuning

class imbalance

tail-class discriminability

Innovation

Methods, ideas, or system contributions that make the work stand out.

cluster-aware

neural collapse

prompt tuning