🤖 AI Summary
Deep learning models suffer from degraded generalization when training and test distributions differ. Conventional domain generalization (DG) relies on multiple source domains, incurring high data acquisition costs; emerging source-free DG (SFDG) leverages vision-language models (e.g., CLIP) with text prompts to guide visual representation learning, alleviating data dependency but failing to eliminate domain-specific confounding factors. This work pioneers the integration of causal inference into the SFDG framework: we construct a confounder dictionary and design a text-driven causal intervention network to disentangle and strengthen domain-invariant features; further, we enhance style word embeddings and impose causal regularization. Extensive experiments on PACS, VLCS, OfficeHome, and DomainNet demonstrate substantial improvements over state-of-the-art SFDG methods, validating the efficacy of causally grounded representations for cross-domain generalization.
📝 Abstract
Deep learning often struggles when training and test data distributions differ. Traditional domain generalization (DG) tackles this by including data from multiple source domains, which is impractical due to expensive data collection and annotation. Recent vision-language models like CLIP enable source-free domain generalization (SFDG) by using text prompts to simulate visual representations, reducing data demands. However, existing SFDG methods struggle with domain-specific confounders, limiting their generalization capabilities. To address this issue, we propose TDCRL ( extbf{T}ext- extbf{D}riven extbf{C}ausal extbf{R}epresentation extbf{L}earning), the first method to integrate causal inference into the SFDG setting. TDCRL operates in two steps: first, it employs data augmentation to generate style word vectors, combining them with class information to generate text embeddings to simulate visual representations; second, it trains a causal intervention network with a confounder dictionary to extract domain-invariant features. Grounded in causal learning, our approach offers a clear and effective mechanism to achieve robust, domain-invariant features, ensuring robust generalization. Extensive experiments on PACS, VLCS, OfficeHome, and DomainNet show state-of-the-art performance, proving TDCRL effectiveness in SFDG.