LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of large language models (LLMs) to energy- and latency-based attacks during inference. Unlike conventional approaches that exploit delayed termination-token emission, we propose a novel attack paradigm—repetition-induced generation—that leverages prompt engineering to trigger low-entropy decoding loops, thereby significantly inflating output length, computational cost, and latency. Methodologically, we introduce a gradient-aligned ensemble optimization strategy integrating token-level alignment and repetition-inducing prompt design to enhance cross-model transferability. Experiments across 14 mainstream LLMs demonstrate that our attack extends generated sequences to over 90% of the context window (versus ~20% for baselines) and achieves ~40% higher cross-model attack success rates on DeepSeek-V3 and Gemini 2.5 Flash compared to state-of-the-art methods, markedly outperforming existing techniques in both efficacy and portability.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack framework based on the observation that repetitive generation can trigger low-entropy decoding loops, reliably compelling LLMs to generate until their output limits. LoopLLM introduces (1) a repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to induce repetitive generation, and (2) a token-aligned ensemble optimization that aggregates gradients to improve cross-model transferability. Extensive experiments on 12 open-source and 2 commercial LLMs show that LoopLLM significantly outperforms existing methods, achieving over 90% of the maximum output length, compared to 20% for baselines, and improving transferability by around 40% to DeepSeek-V3 and Gemini 2.5 Flash.
Problem

Research questions and friction points this paper is trying to address.

Inducing repetitive generation loops to maximize LLM energy-latency costs
Overcoming termination symbol limitations in existing energy-latency attacks
Improving cross-model transferability of energy-latency attacks through gradient aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Induces repetitive generation via prompt optimization
Uses token-aligned ensemble for transferability
Triggers low-entropy decoding loops to maximize output
X
Xingyu Li
National Interdisciplinary Research Center of Engineering Physics, Institute of Computer Application, China Academy of Engineering Physics
Xiaolei Liu
Xiaolei Liu
National Interdisciplinary Research Center of Engineering Physics
Trustworthy AIData-driven SecurityPrivacy
C
Cheng Liu
National Interdisciplinary Research Center of Engineering Physics, Institute of Computer Application, China Academy of Engineering Physics
Yixiao Xu
Yixiao Xu
Beijing University of Posts and Telecommunications
AI Securityadversarial examplebackdoor attack
Kangyi Ding
Kangyi Ding
Institute of Computer Application, China Academy of Engineering Physics
Bangzhou Xin
Bangzhou Xin
USTC
data privacymachine learning
Jia-Li Yin
Jia-Li Yin
Fuzhou University