Target-Oriented Single Domain Generalization

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak generalization capability of models under distribution shifts in single-domain generalization (SDG), this paper proposes a novel Target-Oriented SDG (TO-SDG) paradigm: for the first time, it leverages textual descriptions of the target domain—without any target-domain data—as external semantic priors to guide model generalization. Methodologically, we construct a target-anchored subspace grounded in vision-language models (e.g., CLIP), employ spectral projection to directionally align source features with the target semantic space, and introduce vision-language distillation to suppress source-domain noise. The framework requires no target data—only lightweight textual metadata. Evaluated on multiple benchmarks for image classification and object detection, TO-SDG consistently outperforms state-of-the-art SDG methods, empirically validating the efficacy of text-guided supervision in enhancing cross-distribution robustness.

Technology Category

Application Category

📝 Abstract
Deep models trained on a single source domain often fail catastrophically under distribution shifts, a critical challenge in Single Domain Generalization (SDG). While existing methods focus on augmenting source data or learning invariant features, they neglect a readily available resource: textual descriptions of the target deployment environment. We propose Target-Oriented Single Domain Generalization (TO-SDG), a novel problem setup that leverages the textual description of the target domain, without requiring any target data, to guide model generalization. To address TO-SDG, we introduce Spectral TARget Alignment (STAR), a lightweight module that injects target semantics into source features by exploiting visual-language models (VLMs) such as CLIP. STAR uses a target-anchored subspace derived from the text embedding of the target description to recenter image features toward the deployment domain, then utilizes spectral projection to retain directions aligned with target cues while discarding source-specific noise. Moreover, we use a vision-language distillation to align backbone features with VLM's semantic geometry. STAR further employs feature-space Mixup to ensure smooth transitions between source and target-oriented representations. Experiments across various image classification and object detection benchmarks demonstrate STAR's superiority. This work establishes that minimal textual metadata, which is a practical and often overlooked resource, significantly enhances generalization under severe data constraints, opening new avenues for deploying robust models in target environments with unseen data.
Problem

Research questions and friction points this paper is trying to address.

Leveraging target domain text descriptions to guide model generalization
Addressing catastrophic failure under distribution shifts in single source training
Enhancing generalization without requiring any target domain data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages target domain text descriptions without data
Uses spectral projection to align target semantic cues
Applies vision-language distillation and feature-space Mixup
🔎 Similar Papers
No similar papers found.