🤖 AI Summary
Class-Incremental Unsupervised Domain Adaptation (CI-UDA) requires models to continuously align cross-domain distributions and mitigate catastrophic forgetting as target-domain classes—each a subset of the source classes—arrive incrementally and without labels. To address this, we propose a CLIP-based continual domain adaptation framework that leverages CLIP’s zero-shot capability to extract class-agnostic semantic attributes, constructing a shared “visual prototype–text prompt” key-value dictionary across domains. By enforcing dual-domain attribute dictionary matching alongside visual attention and prediction consistency constraints, our method achieves progressive, rehearsal-free domain alignment. Unlike prior approaches relying on exemplar replay or architectural expansion, ours preserves model compactness while enabling stable knowledge retention. Extensive experiments on three standard benchmarks demonstrate significant improvements over state-of-the-art methods, with enhanced robustness to forgetting and superior overall adaptation performance.
📝 Abstract
Class-Incremental Unsupervised Domain Adaptation (CI-UDA) aims to adapt a model from a labeled source domain to an unlabeled target domain, where the sets of potential target classes appearing at different time steps are disjoint and are subsets of the source classes. The key to solving this problem lies in avoiding catastrophic forgetting of knowledge about previous target classes during continuously mitigating the domain shift. Most previous works cumbersomely combine two technical components. On one hand, they need to store and utilize rehearsal target sample from previous time steps to avoid catastrophic forgetting; on the other hand, they perform alignment only between classes shared across domains at each time step. Consequently, the memory will continuously increase and the asymmetric alignment may inevitably result in knowledge forgetting. In this paper, we propose to mine and preserve domain-invariant and class-agnostic knowledge to facilitate the CI-UDA task. Specifically, via using CLIP, we extract the class-agnostic properties which we name as "attribute". In our framework, we learn a "key-value" pair to represent an attribute, where the key corresponds to the visual prototype and the value is the textual prompt. We maintain two attribute dictionaries, each corresponding to a different domain. Then we perform attribute alignment across domains to mitigate the domain shift, via encouraging visual attention consistency and prediction consistency. Through attribute modeling and cross-domain alignment, we effectively reduce catastrophic knowledge forgetting while mitigating the domain shift, in a rehearsal-free way. Experiments on three CI-UDA benchmarks demonstrate that our method outperforms previous state-of-the-art methods and effectively alleviates catastrophic forgetting. Code is available at https://github.com/RyunMi/VisTA.