Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Multimodal recommender systems are vulnerable to evasion-based promotional attacks, yet existing defense mechanisms are largely confined to single-modality settings and tailored to poisoning attacks, rendering them inadequate against such threats. This work proposes UAT-MC, which, for the first time, identifies the cross-modal gradient inconsistency issue in multi-user promotional scenarios and introduces a gradient alignment mechanism to jointly optimize the direction of untargeted adversarial perturbations across visual and textual modalities. By leveraging multimodal collaborative untargeted adversarial training, the method significantly enhances model robustness against evasion attacks while preserving recommendation performance, effectively striking a balance between defense strength and accuracy.

📝 Abstract

Multimodal recommender systems exploit visual and textual signals to alleviate data sparsity, but this also makes them more vulnerable to evasion-based promotion attacks. Existing defenses are largely limited to single-modal settings and mainly focus on poisoning-based threats, leaving evasion-based threats underexplored. In this work, we first identify a cross-modal gradient mismatch under the multi-user promotion setting, where visual and textual perturbations are optimized in inconsistent directions due to the dominance of distinct user groups. This phenomenon dilutes the attack effectiveness and leads robust training to underestimate worst-case risks. To address this issue, we propose Untargeted Adversarial Training with Multimodal Coordination (UAT-MC). UAT-MC tackles the challenge of unknown targeted items in evasion-based attacks (as opposed to poisoning-based attacks) by treating all items as potential targets, and introduces a gradient alignment mechanism to explicitly correct this mismatch. This design ensures synchronized perturbations across modalities, thereby maximizing adversarial strength for robust training. Extensive experiments demonstrate that UAT-MC significantly improves robustness against promotion attacks while maintaining acceptable recommendation performance under the defense-accuracy trade-off. Code is available at https://github.com/gmXian/UAT-MC.

Problem

Research questions and friction points this paper is trying to address.

multimodal recommender systems

evasion-based attacks

promotion attacks

adversarial robustness

cross-modal gradient mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal coordination

evasion-based attacks

adversarial training