π€ AI Summary
This work addresses a critical gap in existing research by investigating the vulnerability of large language model (LLM) agents to hardware-level bit-flip attacks in multi-stage reasoning and tool-use scenarios. We propose Flip-Agent, a novel framework that extends targeted bit-flip attacks to LLM agent systems for the first time, leveraging hardware fault injection techniques to precisely manipulate agent behavior at the parameter level. Our approach enables fine-grained control over the agentβs reasoning trajectory and tool invocation decisions. Experimental results on realistic multi-stage tasks demonstrate that Flip-Agent significantly outperforms existing attack methods, successfully steering both the agentβs outputs and its tool usage patterns. These findings reveal a fundamental security weakness in LLM agents when deployed in complex, real-world environments.
π Abstract
Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.