OVERTHINKING: Slowdown Attacks on Reasoning LLMs

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This paper identifies a novel “overthinking” (OVERTHINK) slowdown attack targeting reasoning-oriented large language models (LLMs), such as those deployed in RAG systems. The attack injects semantically benign yet computationally expensive decoy queries (e.g., MDP modeling tasks) into publicly accessible context (e.g., retrieved documents), inducing the model to generate lengthy, logically correct but highly inefficient reasoning chains—thereby inflating token consumption and response latency. Crucially, OVERTHINK is non-adversarial and compliance-preserving: it evades safety filters, preserves output semantics, and exploits inherent scalability of reasoning paths for high stealth and cross-model transferability. Experiments demonstrate up to 46× latency increase on models including OpenAI o1/o3-mini and DeepSeek R1, with strong generalization across FreshQA and SQuAD benchmarks. This work establishes the first slowdown attack paradigm tailored to reasoning LLMs and proposes a defense framework integrating LLM self-auditing with system-level coordination.

Technology Category

Application Category

📝 Abstract

We increase overhead for applications that rely on reasoning LLMs-we force models to spend an amplified number of reasoning tokens, i.e.,"overthink", to respond to the user query while providing contextually correct answers. The adversary performs an OVERTHINK attack by injecting decoy reasoning problems into the public content that is used by the reasoning LLM (e.g., for RAG applications) during inference time. Due to the nature of our decoy problems (e.g., a Markov Decision Process), modified texts do not violate safety guardrails. We evaluated our attack across closed-(OpenAI o1, o1-mini, o3-mini) and open-(DeepSeek R1) weights reasoning models on the FreshQA and SQuAD datasets. Our results show up to 46x slowdown and high transferability of the attack across models. To protect applications, we discuss and implement defenses leveraging LLM-based and system design approaches. Finally, we discuss societal, financial, and energy impacts of OVERTHINK attack which could amplify the costs for third party applications operating reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Slowdown attacks on reasoning LLMs

Injecting decoy reasoning problems

Defenses leveraging LLM-based approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inject decoy reasoning problems

Force amplified reasoning tokens

Implement LLM-based defenses

🔎 Similar Papers

Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks