Text-Driven Causal Representation Learning for Source-Free Domain Generalization

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning models suffer from degraded generalization when training and test distributions differ. Conventional domain generalization (DG) relies on multiple source domains, incurring high data acquisition costs; emerging source-free DG (SFDG) leverages vision-language models (e.g., CLIP) with text prompts to guide visual representation learning, alleviating data dependency but failing to eliminate domain-specific confounding factors. This work pioneers the integration of causal inference into the SFDG framework: we construct a confounder dictionary and design a text-driven causal intervention network to disentangle and strengthen domain-invariant features; further, we enhance style word embeddings and impose causal regularization. Extensive experiments on PACS, VLCS, OfficeHome, and DomainNet demonstrate substantial improvements over state-of-the-art SFDG methods, validating the efficacy of causally grounded representations for cross-domain generalization.

Technology Category

Application Category

📝 Abstract
Deep learning often struggles when training and test data distributions differ. Traditional domain generalization (DG) tackles this by including data from multiple source domains, which is impractical due to expensive data collection and annotation. Recent vision-language models like CLIP enable source-free domain generalization (SFDG) by using text prompts to simulate visual representations, reducing data demands. However, existing SFDG methods struggle with domain-specific confounders, limiting their generalization capabilities. To address this issue, we propose TDCRL ( extbf{T}ext- extbf{D}riven extbf{C}ausal extbf{R}epresentation extbf{L}earning), the first method to integrate causal inference into the SFDG setting. TDCRL operates in two steps: first, it employs data augmentation to generate style word vectors, combining them with class information to generate text embeddings to simulate visual representations; second, it trains a causal intervention network with a confounder dictionary to extract domain-invariant features. Grounded in causal learning, our approach offers a clear and effective mechanism to achieve robust, domain-invariant features, ensuring robust generalization. Extensive experiments on PACS, VLCS, OfficeHome, and DomainNet show state-of-the-art performance, proving TDCRL effectiveness in SFDG.
Problem

Research questions and friction points this paper is trying to address.

Addresses domain-specific confounders in source-free domain generalization
Integrates causal inference to extract domain-invariant features
Reduces data demands using text-driven visual representation simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses text prompts to simulate visual representations
Integrates causal inference into domain generalization
Trains causal network with confounder dictionary
🔎 Similar Papers
No similar papers found.