π€ AI Summary
This work addresses a critical security vulnerability in large reasoning models: their susceptibility to βoverthinkingβ when confronted with logically inconsistent or incomplete inputs, which can trigger excessively verbose reasoning chains, exhaust computational resources, and thereby expose a novel denial-of-service attack surface. The study formally characterizes overthinking as an exploitable flaw and introduces a black-box adversarial framework based on a hierarchical genetic algorithm. This framework automatically generates effective adversarial prompts by perturbing the logical structure of queries and optimizing a composite fitness function that jointly considers response length and reflectiveness. Experiments demonstrate that the proposed method increases average output length by 26.1Γ across four mainstream reasoning models, substantially outperforming both human-crafted and benign baselines. Notably, adversarial samples generated using smaller models successfully transfer to large commercial systems.
π Abstract
Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.