Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses a critical security vulnerability in large reasoning models: their susceptibility to “overthinking” when confronted with logically inconsistent or incomplete inputs, which can trigger excessively verbose reasoning chains, exhaust computational resources, and thereby expose a novel denial-of-service attack surface. The study formally characterizes overthinking as an exploitable flaw and introduces a black-box adversarial framework based on a hierarchical genetic algorithm. This framework automatically generates effective adversarial prompts by perturbing the logical structure of queries and optimizing a composite fitness function that jointly considers response length and reflectiveness. Experiments demonstrate that the proposed method increases average output length by 26.1× across four mainstream reasoning models, substantially outperforming both human-crafted and benign baselines. Notably, adversarial samples generated using smaller models successfully transfer to large commercial systems.

📝 Abstract

Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.

Problem

Research questions and friction points this paper is trying to address.

overthinking

denial-of-service

large reasoning models

computational availability

black-box attack

Innovation

Methods, ideas, or system contributions that make the work stand out.

overthinking

hierarchical genetic algorithm

black-box attack