🤖 AI Summary
This paper identifies a novel “overthinking” (OVERTHINK) slowdown attack targeting reasoning-oriented large language models (LLMs), such as those deployed in RAG systems. The attack injects semantically benign yet computationally expensive decoy queries (e.g., MDP modeling tasks) into publicly accessible context (e.g., retrieved documents), inducing the model to generate lengthy, logically correct but highly inefficient reasoning chains—thereby inflating token consumption and response latency. Crucially, OVERTHINK is non-adversarial and compliance-preserving: it evades safety filters, preserves output semantics, and exploits inherent scalability of reasoning paths for high stealth and cross-model transferability. Experiments demonstrate up to 46× latency increase on models including OpenAI o1/o3-mini and DeepSeek R1, with strong generalization across FreshQA and SQuAD benchmarks. This work establishes the first slowdown attack paradigm tailored to reasoning LLMs and proposes a defense framework integrating LLM self-auditing with system-level coordination.
📝 Abstract
We increase overhead for applications that rely on reasoning LLMs-we force models to spend an amplified number of reasoning tokens, i.e.,"overthink", to respond to the user query while providing contextually correct answers. The adversary performs an OVERTHINK attack by injecting decoy reasoning problems into the public content that is used by the reasoning LLM (e.g., for RAG applications) during inference time. Due to the nature of our decoy problems (e.g., a Markov Decision Process), modified texts do not violate safety guardrails. We evaluated our attack across closed-(OpenAI o1, o1-mini, o3-mini) and open-(DeepSeek R1) weights reasoning models on the FreshQA and SQuAD datasets. Our results show up to 46x slowdown and high transferability of the attack across models. To protect applications, we discuss and implement defenses leveraging LLM-based and system design approaches. Finally, we discuss societal, financial, and energy impacts of OVERTHINK attack which could amplify the costs for third party applications operating reasoning models.