ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work addresses the vulnerability of Large Reasoning Models (LRMs) to Prompt-induced Denial-of-Service (PI-DoS) attacks, which are challenging to detect due to their high computational cost yet low perceptual conspicuousness. The study formally characterizes PI-DoS attacks through three core properties: high amplification ratio, stealthiness, and optimizability. To exploit this vulnerability, the authors propose a reinforcement learning–based adversarial framework that employs a constant-time proxy reward mechanism to generate extremely short, semantically natural prompts capable of triggering pathological long reasoning in LRMs. Evaluated across seven open-source and three commercial models, the method induces an average of 19,263 reasoning tokens per query, achieving an input-output amplification ratio of 286.7× while evading both single-stage and joint distribution shift detectors with over 98% success rate.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service (PI-DoS) attacks that exploit the high computational cost of reasoning. We first formalize inference cost for LRMs and define PI-DoS, then prove that any practical PI-DoS attack should satisfy three properties: (1) a high amplification ratio, where each query induces a disproportionately long reasoning trace relative to its own length; (ii) stealthiness, in which prompts and responses remain on the natural language manifold and evade distribution shift detectors; and (iii) optimizability, in which the attack supports efficient optimization without being slowed by its own success. Under this framework, we present ReasoningBomb, a reinforcement-learning-based PI-DoS framework that is guided by a constant-time surrogate reward and trains a large reasoning-model attacker to generate short natural prompts that drive victim LRMs into pathologically long and often effectively non-terminating reasoning. Across seven open-source models (including LLMs and LRMs) and three commercial LRMs, ReasoningBomb induces 18,759 completion tokens on average and 19,263 reasoning tokens on average across reasoning models. It outperforms the the runner-up baseline by 35% in completion tokens and 38% in reasoning tokens, while inducing 6-7x more tokens than benign queries and achieving 286.7x input-to-output amplification ratio averaged across all samples. Additionally, our method achieves 99.8% bypass rate on input-based detection, 98.7% on output-based detection, and 98.4% against strict dual-stage joint detection.

Problem

Research questions and friction points this paper is trying to address.

Denial-of-Service

Large Reasoning Models

Prompt-induced Attack

Inference-time Attack

Reasoning Trace

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReasoningBomb

PI-DoS

large reasoning models