SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Large language models (LLMs) face model integrity threats from bit-flip attacks (BFAs) during deployment. Existing BFAs are constrained by data-type isolation and suffer from poor stealth in floating-point models, where random single-bit flips often induce numerical anomalies (e.g., NaN/Inf). Method: We propose SBFA—the first single-bit, stealthy BFA targeting mixed-precision (BF16/INT8) LLMs—leveraging an ImpactScore metric that jointly models gradient sensitivity and inter-layer weight distribution constraints, coupled with a lightweight SKIP search algorithm to precisely identify critical bits. Results: On Qwen, LLaMA, and Gemma, a single-bit flip suffices to degrade MMLU and SST-2 accuracy to chance level; attack execution requires only tens of minutes. SBFA is the first to empirically demonstrate the extreme vulnerability of LLMs to single-bit perturbations, establishing a new benchmark for robustness evaluation in mixed-precision inference.

Technology Category

Application Category

📝 Abstract

Model integrity of Large language models (LLMs) has become a pressing security concern with their massive online deployment. Prior Bit-Flip Attacks (BFAs) -- a class of popular AI weight memory fault-injection techniques -- can severely compromise Deep Neural Networks (DNNs): as few as tens of bit flips can degrade accuracy toward random guessing. Recent studies extend BFAs to LLMs and reveal that, despite the intuition of better robustness from modularity and redundancy, only a handful of adversarial bit flips can also cause LLMs' catastrophic accuracy degradation. However, existing BFA methods typically focus on either integer or floating-point models separately, limiting attack flexibility. Moreover, in floating-point models, random bit flips often cause perturbed parameters to extreme values (e.g., flipping in exponent bit), making it not stealthy and leading to numerical runtime error (e.g., invalid tensor values (NaN/Inf)). In this work, for the first time, we propose SBFA (Sneaky Bit-Flip Attack), which collapses LLM performance with only one single bit flip while keeping perturbed values within benign layer-wise weight distribution. It is achieved through iterative searching and ranking through our defined parameter sensitivity metric, ImpactScore, which combines gradient sensitivity and perturbation range constrained by the benign layer-wise weight distribution. A novel lightweight SKIP searching algorithm is also proposed to greatly reduce searching complexity, which leads to successful SBFA searching taking only tens of minutes for SOTA LLMs. Across Qwen, LLaMA, and Gemma models, with only one single bit flip, SBFA successfully degrades accuracy to below random levels on MMLU and SST-2 in both BF16 and INT8 data formats. Remarkably, flipping a single bit out of billions of parameters reveals a severe security concern of SOTA LLM models.

Problem

Research questions and friction points this paper is trying to address.

SBFA stealthily degrades LLM performance with single bit flips

It overcomes limitations of existing integer/floating-point BFA methods

The attack maintains perturbed weights within benign distribution ranges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single bit flip attack degrades LLM performance

ImpactScore metric combines gradient and perturbation range

Lightweight SKIP algorithm reduces search complexity

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation