Targeted Bit-Flip Attacks on LLM-Based Agents

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses a critical gap in existing research by investigating the vulnerability of large language model (LLM) agents to hardware-level bit-flip attacks in multi-stage reasoning and tool-use scenarios. We propose Flip-Agent, a novel framework that extends targeted bit-flip attacks to LLM agent systems for the first time, leveraging hardware fault injection techniques to precisely manipulate agent behavior at the parameter level. Our approach enables fine-grained control over the agent’s reasoning trajectory and tool invocation decisions. Experimental results on realistic multi-stage tasks demonstrate that Flip-Agent significantly outperforms existing attack methods, successfully steering both the agent’s outputs and its tool usage patterns. These findings reveal a fundamental security weakness in LLM agents when deployed in complex, real-world environments.

Technology Category

Application Category

📝 Abstract

Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.

Problem

Research questions and friction points this paper is trying to address.

Targeted Bit-Flip Attacks

LLM-based Agents

Hardware Faults

Security Threat

Tool Invocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

bit-flip attack

LLM-based agents

hardware fault injection