VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing poisoning attacks against multimodal large language model–based recommender systems are often mitigated by cross-modal consistency mechanisms. This work formalizes, for the first time, the threat of cross-modal interactive poisoning and proposes a method that identifies high-exposure regions in the joint embedding space and generates text-image coupled perturbations guided by attention to achieve targeted manipulation during fine-tuning. By integrating exposure alignment, cross-modal interactive perturbation generation, and alignment optimization, the proposed approach achieves an average ER@20 of 0.73 across three real-world datasets—surpassing the strongest baseline by 0.52 absolute points—while largely preserving recommendation utility. These results uncover a novel security vulnerability in multimodal fusion recommender systems.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fused representations along stable semantic directions during fine-tuning. To characterise this threat, we formalise cross-modal interactive poisoning and propose VENOMREC, which performs Exposure Alignment to identify high-exposure regions in the joint embedding space and Cross-modal Interactive Perturbation to craft attention-guided coupled token-patch edits. Experiments on three real-world multimodal datasets demonstrate that VENOMREC consistently outperforms strong baselines, achieving 0.73 mean ER@20 and improving over the strongest baseline by +0.52 absolute ER points on average, while maintaining comparable recommendation utility.

Problem

Research questions and friction points this paper is trying to address.

cross-modal poisoning

multimodal LLM

recommender systems

targeted promotion

interactive perturbation

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal interactive poisoning

multimodal LLM recommender systems

exposure alignment