VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised domain adaptation (UDA) methods require fine-tuning the backbone network for each source–target domain pair, leading to linear growth in parameters and memory overhead, and impeding backbone reuse across domains. To address this, we propose Vision Reprogramming for Domain Adaptation (VirDA), the first method to insert a learnable visual prompt layer *before* a frozen backbone, enabling tuning-free domain adaptation via texture biasing. VirDA jointly optimizes the prompts end-to-end using a multi-objective loss combining adversarial and contrastive learning, explicitly enforcing intra-domain consistency and inter-domain alignment. On Office-31, VirDA achieves a mean accuracy of 92.8% with only 1.5M trainable parameters—outperforming CDTrans and FixBi by 0.2% and 1.4%, respectively, while using merely 1.7% and 2.8% of their parameter counts. This yields substantial gains in parameter efficiency and enables robust backbone reuse across diverse domain pairs.

Technology Category

Application Category

📝 Abstract
Existing UDA pipelines fine-tune already well-trained backbone parameters for every new source-and-target pair, resulting in the number of training parameters and storage memory growing linearly with each new pair, and also preventing the reuse of these well-trained backbone parameters. Inspired by recent implications that existing backbones have textural biases, we propose making use of domain-specific textural bias for domain adaptation via visual reprogramming, namely VirDA.Instead of fine-tuning the full backbone, VirDA prepends a domain-specific visual reprogramming layer to the backbone. This layer produces visual prompts that act as an added textural bias to the input image, adapting its ``style'' to a target domain. To optimize these visual reprogramming layers, we use multiple objective functions that optimize the intra- and inter-domain distribution differences when domain-adapting visual prompts are applied. This process does not require modifying the backbone parameters, allowing the same backbone to be reused across different domains. We evaluate VirDA on Office-31 and obtain 92.8% mean accuracy with only 1.5M trainable parameters. VirDA surpasses PDA, the state-of-the-art parameter-efficient UDA baseline, by +1.6% accuracy while using just 46% of its parameters. Compared with full-backbone fine-tuning, VirDA outperforms CDTrans and FixBi by +0.2% and +1.4%, respectively, while requiring only 1.7% and 2.8% of their trainable parameters. Relative to the strongest current methods (PMTrans and TVT), VirDA uses ~1.7% of their parameters and trades off only 2.2% and 1.1% accuracy, respectively.
Problem

Research questions and friction points this paper is trying to address.

Reducing linear parameter growth in unsupervised domain adaptation methods
Enabling backbone reuse across domains without fine-tuning parameters
Optimizing domain-specific visual prompts to adapt image styles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses visual reprogramming layer for domain adaptation
Optimizes prompts without modifying backbone parameters
Achieves high accuracy with minimal trainable parameters
🔎 Similar Papers
No similar papers found.