Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work identifies and formally defines a novel large language model (LLM) security threat—Adversarial Embedding Attack (AEA): adversaries compromise third-party service platforms or poison open-source model checkpoints, injecting promotional, misleading, or malicious content into LLM outputs via adversarial prompting and backdoor fine-tuning—while preserving superficially benign behavior and undermining output integrity. We first categorize five distinct stakeholder victim groups, exposing critical blind spots in existing defenses. Empirical evaluation demonstrates AEA’s high feasibility under low overhead. To counter this threat, we propose a lightweight, self-auditing prompt-based defense that requires no model retraining and effectively mitigates AEA without sacrificing utility. Our contributions include: (1) the first systematic characterization of AEA as a supply-chain–driven output-integrity threat; (2) empirical validation across diverse models and services; and (3) a practical, deployable defense framework establishing a new benchmark for LLM output controllability and supply-chain security research.

Technology Category

Application Category

📝 Abstract

We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.

Problem

Research questions and friction points this paper is trying to address.

Stealthily inject promotional or malicious content into LLM outputs

Hijack third-party platforms to prepend adversarial prompts

Publish back-doored open-source checkpoints with attacker data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Advertisement Embedding Attacks hijack third-party platforms

Publish back-doored open-source checkpoints with attacker data

Prompt-based self-inspection defense mitigates injections without retraining

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies