Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work reveals a novel security threat at the intersection of large language model (LLM) inference compilation and numerical stability, demonstrating that compiler-level optimizations can introduce exploitable numerical side effects that serve as a stealthy attack surface. The authors propose a backdoor attack framework that requires no modification to compilers or hardware, leveraging input-specific and input-agnostic trigger strategies in conjunction with numerical perturbations inherent in mainstream LLM compilation pipelines. This approach precisely induces targeted or arbitrary mispredictions in compiled models while preserving near-perfect accuracy on clean inputs in both original and compiled states. Experimental evaluation across four widely used open-source LLMs and four distinct tasks shows an average attack success rate of 90%, highlighting the efficacy and stealthiness of the proposed method.

📝 Abstract

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack success rates averaging 90% across four mainstream open-source LLMs and four tasks, while clean accuracy is preserved at nearly 100% under all settings. Our findings reveal a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline, and we investigate practical defenses to mitigate this threat.

Problem

Research questions and friction points this paper is trying to address.

backdoor attacks

LLM security

inference optimization

compiler-induced vulnerabilities

model deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

optimization-triggered backdoor

LLM compilation

numerical side effects