Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalization capability of vision-language models (e.g., CLIP) in cross-domain few-shot image recognition, this paper proposes Consistency-guided Multi-view Collaborative Optimization (CoMuCo). CoMuCo employs a dual-expert architecture to extract complementary vision-language view features and integrates prior-knowledge constraints with an information-geometric consensus mechanism to enforce inter-view consistency regularization. We introduce the first benchmark specifically designed for cross-domain few-shot learning and propose a differentiable collaborative optimization framework. Experiments demonstrate that CoMuCo significantly outperforms state-of-the-art methods across multiple cross-domain few-shot benchmarks, achieving substantial improvements in robustness and generalization under domain shift. These results validate the effectiveness of multi-view collaboration and geometric consistency modeling for cross-domain few-shot recognition.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature learning. Additionally, a new cross-domain few-shot benchmark is established to help comprehensively evaluate methods on imaging domains distinct from natural images. Extensive empirical evaluations on both existing and newly proposed benchmarks suggest CoMuCo consistently outperforms current methods in few-shot tasks. The code and benchmark will be released.
Problem

Research questions and friction points this paper is trying to address.

Enhancing cross-domain few-shot learning with VLMs
Improving robustness in feature learning for diverse imaging domains
Establishing benchmarks for cross-domain few-shot evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view feature extraction with expert modules
Consistency constraints enhance feature robustness
Information geometry-based consensus mechanisms
🔎 Similar Papers
No similar papers found.
D
Dexia Chen
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Wentao Zhang
Wentao Zhang
Institute of Physics, Chinese Academy of Sciences
photoemissionsuperconductivitycupratehtsctime-resolved
Q
Qianjie Zhu
School of Computer, Electronics and Information, Guangxi University, Nanning, China
Ping Hu
Ping Hu
UESTC
Computer VisionDeep LearningImage/Video Processing
Weibing Li
Weibing Li
School of Computer Science and Engineering, Sun Yat-sen University
Neural NetworksRoboticsAutomatic Control
T
Tong Zhang
Peng Cheng Laboratory, Shenzhen, China
R
Ruixuan Wang
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China