TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

To address insufficient model robustness and interpretability in single-source domain generalization caused by semantic shifts (e.g., background and viewpoint variations), this paper proposes a generalization paradigm grounded in locally domain-invariant concepts. Methodologically, we introduce two novel losses: concept saliency alignment loss and local concept contrastive loss. We further design an automated concept-localization annotation pipeline integrating a diffusion model (for feature-guided localization) and a large language model (for semantic parsing). Additionally, we propose a test-time semantic correction framework incorporating a prototype memory bank and an iterative refinement mechanism. Evaluated on four standard benchmarks, our approach achieves an average 12% improvement over state-of-the-art methods. Crucially, predictions exhibit pixel-level concept interpretability—enabling both visual verification and semantic-level correction at test time.

Technology Category

Application Category

📝 Abstract

We consider the problem of single-source domain generalization. Existing methods typically rely on extensive augmentations to synthetically cover diverse domains during training. However, they struggle with semantic shifts (e.g., background and viewpoint changes), as they often learn global features instead of local concepts that tend to be domain invariant. To address this gap, we propose an approach that compels models to leverage such local concepts during prediction. Given no suitable dataset with per-class concepts and localization maps exists, we first develop a novel pipeline to generate annotations by exploiting the rich features of diffusion and large-language models. Our next innovation is TIDE, a novel training scheme with a concept saliency alignment loss that ensures model focus on the right per-concept regions and a local concept contrastive loss that promotes learning domain-invariant concept representations. This not only gives a robust model but also can be visually interpreted using the predicted concept saliency maps. Given these maps at test time, our final contribution is a new correction algorithm that uses the corresponding local concept representations to iteratively refine the prediction until it aligns with prototypical concept representations that we store at the end of model training. We evaluate our approach extensively on four standard DG benchmark datasets and substantially outperform the current state-ofthe-art (12% improvement on average) while also demonstrating that our predictions can be visually interpreted

Problem

Research questions and friction points this paper is trying to address.

Address single-source domain generalization limitations

Generate local concept annotations using diffusion models

Improve robustness with interpretable concept saliency maps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates annotations using diffusion and large-language models

TIDE training with concept saliency and contrastive losses

Test-time correction aligns predictions with prototypical concepts

🔎 Similar Papers

How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?