FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing bit-flip attack (BFA) detection methods suffer from poor generality and weak scalability, hindering efficient hardware-level vulnerability assessment of multimodal large language models (MLLMs). Method: This work pioneers modeling BFA search as a reinforcement learning (RL) sequential decision problem, proposing an architecture-agnostic, scalable Q-learning framework integrated with sensitivity-guided layer pruning to precisely identify minimal critical bit sets. Contribution/Results: Evaluated on LLaMA-3.1-8B and LLAVA, our method achieves catastrophic degradation—reducing MMLU accuracy from 69.9% to 0.2% and VQAv2 score to near zero—with only 5–7 flipped bits. It accelerates attack localization by 2.5× and yields bit sets directly actionable for hardware-level defenses (e.g., ECC deployment), rendering attacks entirely ineffective post-hardening. To the best of our knowledge, this is the first RL-driven paradigm for efficient, generalizable hardware security evaluation of large models.

Technology Category

Application Category

📝 Abstract

Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggle to scale, often failing to analyze the vast parameter space and complex interdependencies of modern foundation models in a reasonable time. This paper proposes FlipLLM, a reinforcement learning (RL) architecture-agnostic framework that formulates BFA discovery as a sequential decision-making problem. FlipLLM combines sensitivity-guided layer pruning with Q-learning to efficiently identify minimal, high-impact bit sets that can induce catastrophic failure. We demonstrate the effectiveness and generalizability of FlipLLM by applying it to a diverse set of models, including prominent text-only LLMs (GPT-2 Large, LLaMA 3.1 8B, and DeepSeek-V2 7B), VLMs such as LLaVA 1.6, and datasets, such as MMLU, MMLU-Pro, VQAv2, and TextVQA. Our results show that FlipLLM can identify critical bits that are vulnerable to BFAs up to 2.5x faster than SOTA methods. We demonstrate that flipping the FlipLLM-identified bits plummets the accuracy of LLaMA 3.1 8B from 69.9% to ~0.2%, and for LLaVA's VQA score from 78% to almost 0%, by flipping as few as 5 and 7 bits, respectively. Further analysis reveals that applying standard hardware protection mechanisms, such as ECC SECDED, to the FlipLLM-identified bit locations completely mitigates the BFA impact, demonstrating the practical value of our framework in guiding hardware-level defenses. FlipLLM offers the first scalable and adaptive methodology for exploring the BFA vulnerability of both language and multimodal foundation models, paving the way for comprehensive hardware-security evaluation.

Problem

Research questions and friction points this paper is trying to address.

Identifies vulnerable bits in AI models using reinforcement learning

Scales bit-flip attack discovery across diverse multimodal foundation models

Guides hardware defenses by efficiently locating critical bit-flip locations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework for bit-flip attacks

Combines sensitivity pruning with Q-learning for efficiency

Identifies minimal high-impact bits to induce failures

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation