POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) are vulnerable to “overthinking” attacks during chain-of-thought (CoT) reasoning—where adversarial prompts induce excessively long, logically vacuous reasoning traces, wasting computational resources without altering final outputs. Method: This paper proposes the first purely prompt-based, black-box attack framework that requires no access to model parameters, external knowledge, poisoned data, or explicit templates/retrieval mechanisms. Instead, it leverages LLM-driven iterative optimization to generate semantically natural and highly stealthy adversarial prompts. Contribution/Results: The core innovation lies in jointly optimizing semantic fidelity and attack efficacy to precisely trigger redundant reasoning while preserving output plausibility. Extensive experiments across diverse LLMs and standard CoT benchmarks demonstrate significant improvements over prior methods—achieving higher overthinking rates with enhanced real-world stealth and cross-model generalizability.

Technology Category

Application Category

📝 Abstract

Recent advances in Chain-of-Thought (CoT) prompting have substantially enhanced the reasoning capabilities of large language models (LLMs), enabling sophisticated problem-solving through explicit multi-step reasoning traces. However, these enhanced reasoning processes introduce novel attack surfaces, particularly vulnerabilities to computational inefficiency through unnecessarily verbose reasoning chains that consume excessive resources without corresponding performance gains. Prior overthinking attacks typically require restrictive conditions including access to external knowledge sources for data poisoning, reliance on retrievable poisoned content, and structurally obvious templates that limit practical applicability in real-world scenarios. To address these limitations, we propose POT (Prompt-Only OverThinking), a novel black-box attack framework that employs LLM-based iterative optimization to generate covert and semantically natural adversarial prompts, eliminating dependence on external data access and model retrieval. Extensive experiments across diverse model architectures and datasets demonstrate that POT achieves superior performance compared to other methods.

Problem

Research questions and friction points this paper is trying to address.

Inducing inefficient verbose reasoning in LLMs

Eliminating dependency on external knowledge sources

Generating covert adversarial prompts through iterative optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box iterative optimization for overthinking

Generates covert adversarial prompts without external data

Eliminates reliance on model retrieval requirements

🔎 Similar Papers

No similar papers found.