Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

๐Ÿ“… 2026-03-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the vulnerability of vision-language models (VLMs) in recommendation scenarios to unauthorized advertisement injection by proposing a novel backdoor attack method grounded in authentic user behavior. Specifically, the attack is triggered when users naturally upload images containing specific semantic content and pose recommendation-related queries, enabling stealthy insertion of attacker-specified promotional messages. Unlike prior approaches relying on synthetic triggers, this method leverages genuine userโ€“model interactions, ensuring high naturalness and concealment without degrading the modelโ€™s original performance. Through a multi-level threat modeling framework, chain-of-thought data generated by teacher models, and optimized soft prompting combined with supervised fine-tuning, the approach achieves effective backdoor implantation. Experiments demonstrate strong transferability across three mainstream VLM architectures, low false-positive rates, support for multiple advertising domains, and resilience against existing defenses that struggle to remove the backdoor without compromising model utility.
๐Ÿ“ Abstract
Vision-Language Models (VLMs) are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements. Unlike traditional pattern-triggered backdoors that rely on artificial triggers such as pixel patches or special tokens, Hidden Ads activates on natural user behaviors: when users upload images containing semantic content of interest (e.g., food, cars, animals) and ask recommendation-seeking questions, the backdoored model provides correct, helpful answers while seamlessly appending attacker-specified promotional slogans. This design preserves model utility and produces natural-sounding injections, making the attack practical for real-world deployment in consumer-facing recommendation services. We propose a multi-tier threat framework to systematically evaluate Hidden Ads across three adversary capability levels: hard prompt injection, soft prompt optimization, and supervised fine-tuning. Our poisoned data generation pipeline uses teacher VLM-generated chain-of-thought reasoning to create natural trigger--slogan associations across multiple semantic domains. Experiments on three VLM architectures demonstrate that Hidden Ads achieves high injection efficacy with near-zero false positives while maintaining task accuracy. Ablation studies confirm that the attack is data-efficient, transfers effectively to unseen datasets, and scales to multiple concurrent domain-slogan pairs. We evaluate defenses including instruction-based filtering and clean fine-tuning, finding that both fail to remove the backdoor without causing significant utility degradation.
Problem

Research questions and friction points this paper is trying to address.

backdoor attacks
advertisement injection
vision-language models
semantic triggers
recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic backdoor
advertisement injection
vision-language models
behavior-triggered attack
multi-tier threat framework
๐Ÿ”Ž Similar Papers
No similar papers found.