Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the challenge of semantic misalignment in unsupervised domain adaptation caused by substantial appearance and scale discrepancies between domains. To mitigate this issue, the authors propose a cross-attention mechanism regularized by beneficial noise, integrated with multi-scale feature matching, which effectively disentangles domain-shared semantic content from domain-specific stylistic attributes. By incorporating noise-augmented cross-attention and a cross-scale alignment module into a domain-adaptive Transformer architecture, the model achieves enhanced robustness against style-induced interference and improved cross-domain semantic consistency. The proposed method attains state-of-the-art performance on standard benchmarks including VisDA-2017, yielding a 2.3% absolute improvement in overall accuracy, with particularly notable gains of 5.9% on the “truck” category.

Technology Category

Application Category

📝 Abstract

Unsupervised Domain Adaptation (UDA) seeks to transfer knowledge from a labeled source domain to an unlabeled target domain but often suffers from severe domain and scale gaps that degrade performance. Existing cross-attention-based transformers can align features across domains, yet they struggle to preserve content semantics under large appearance and scale variations. To explicitly address these challenges, we introduce the concept of beneficial noise, which regularizes cross-attention by injecting controlled perturbations, encouraging the model to ignore style distractions and focus on content. We propose the Domain-Adaptive Cross-Scale Matching (DACSM) framework, which consists of a Domain-Adaptive Transformer (DAT) for disentangling domain-shared content from domain-specific style, and a Cross-Scale Matching (CSM) module that adaptively aligns features across multiple resolutions. DAT incorporates beneficial noise into cross-attention, enabling progressive domain translation with enhanced robustness, yielding content-consistent and style-invariant representations. Meanwhile, CSM ensures semantic consistency under scale changes. Extensive experiments on VisDA-2017, Office-Home, and DomainNet demonstrate that DACSM achieves state-of-the-art performance, with up to +2.3% improvement over CDTrans on VisDA-2017. Notably, DACSM achieves a +5.9% gain on the challenging "truck" class of VisDA, evidencing the strength of beneficial noise in handling scale discrepancies. These results highlight the effectiveness of combining domain translation, beneficial-noise-enhanced attention, and scale-aware alignment for robust cross-domain representation learning.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised Domain Adaptation

Cross-Attention

Domain Gap

Scale Variation

Content Semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

beneficial noise

cross-attention

unsupervised domain adaptation