Closing the Confusion Loop: CLIP-Guided Alignment for Source-Free Domain Adaptation

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of asymmetric and dynamic class confusion in the target domain caused by inter-class visual similarity under source-free domain adaptation settings, where source data are unavailable. To tackle this issue, the authors propose the CLIP-Guided Alignment (CGA) framework, which explicitly models and leverages such confusion to enhance pseudo-label quality and classification performance. CGA introduces a Multi-directional Confusion Awareness (MCA) module to detect directional confusion pairs, a Misclassification-aware CLIP Prompting (MCC) module to generate confusion-aware textual prompts for CLIP, and a Feature Alignment Module (FAM) that aligns source-model features with CLIP’s confusion-guided representations via contrastive learning. Extensive experiments demonstrate that CGA significantly outperforms existing source-free domain adaptation methods, particularly excelling in fine-grained and high-confusion scenarios.

Technology Category

Application Category

📝 Abstract

Source-Free Domain Adaptation (SFDA) tackles the problem of adapting a pre-trained source model to an unlabeled target domain without accessing any source data, which is quite suitable for the field of data security. Although recent advances have shown that pseudo-labeling strategies can be effective, they often fail in fine-grained scenarios due to subtle inter-class similarities. A critical but underexplored issue is the presence of asymmetric and dynamic class confusion, where visually similar classes are unequally and inconsistently misclassified by the source model. Existing methods typically ignore such confusion patterns, leading to noisy pseudo-labels and poor target discrimination. To address this, we propose CLIP-Guided Alignment(CGA), a novel framework that explicitly models and mitigates class confusion in SFDA. Generally, our method consists of three parts: (1) MCA: detects first directional confusion pairs by analyzing the predictions of the source model in the target domain; (2) MCC: leverages CLIP to construct confusion-aware textual prompts (e.g. a truck that looks like a bus), enabling more context-sensitive pseudo-labeling; and (3) FAM: builds confusion-guided feature banks for both CLIP and the source model and aligns them using contrastive learning to reduce ambiguity in the representation space. Extensive experiments on various datasets demonstrate that CGA consistently outperforms state-of-the-art SFDA methods, with especially notable gains in confusion-prone and fine-grained scenarios. Our results highlight the importance of explicitly modeling inter-class confusion for effective source-free adaptation. Our code can be find at https://github.com/soloiro/CGA

Problem

Research questions and friction points this paper is trying to address.

Source-Free Domain Adaptation

Class Confusion

Fine-Grained Recognition

Pseudo-Labeling

Domain Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Domain Adaptation

Class Confusion

CLIP-Guided Alignment