Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies and formally defines a novel large language model (LLM) security threat—Adversarial Embedding Attack (AEA): adversaries compromise third-party service platforms or poison open-source model checkpoints, injecting promotional, misleading, or malicious content into LLM outputs via adversarial prompting and backdoor fine-tuning—while preserving superficially benign behavior and undermining output integrity. We first categorize five distinct stakeholder victim groups, exposing critical blind spots in existing defenses. Empirical evaluation demonstrates AEA’s high feasibility under low overhead. To counter this threat, we propose a lightweight, self-auditing prompt-based defense that requires no model retraining and effectively mitigates AEA without sacrificing utility. Our contributions include: (1) the first systematic characterization of AEA as a supply-chain–driven output-integrity threat; (2) empirical validation across diverse models and services; and (3) a practical, deployable defense framework establishing a new benchmark for LLM output controllability and supply-chain security research.

Technology Category

Application Category

📝 Abstract
We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.
Problem

Research questions and friction points this paper is trying to address.

Stealthily inject promotional or malicious content into LLM outputs
Hijack third-party platforms to prepend adversarial prompts
Publish back-doored open-source checkpoints with attacker data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advertisement Embedding Attacks hijack third-party platforms
Publish back-doored open-source checkpoints with attacker data
Prompt-based self-inspection defense mitigates injections without retraining
🔎 Similar Papers
No similar papers found.
Qiming Guo
Qiming Guo
RA at Texas A&M University-CC
ST-GraphAI for ScienceAI SecurityMachine UnlearningAI for Mental Health
J
Jinwen Tang
EECS Department, University of Missouri
X
Xingran Huang
Department of Computer Engineering, University of California–Riverside