🤖 AI Summary
This work addresses a critical security vulnerability in Large Reasoning Models (LRMs), which are prone to resource exhaustion due to excessive reasoning during self-reflection. The study introduces, for the first time, recursive entropy as a quantitative measure to characterize resource consumption trends in the reflection phase, thereby exposing an inherent safety flaw in LRMs. By constructing counterfactual queries that trigger redundant reasoning chains, the authors generate adversarial inputs designed to induce uncontrolled resource usage. Experimental results demonstrate that this attack can inflate model output length by up to 11-fold and reduce throughput by 90%, effectively disrupting the natural decay pattern of recursive entropy observed under benign reasoning conditions. These findings substantiate both the feasibility and severity of such resource-abuse attacks against LRMs.
📝 Abstract
Large Reasoning Models (LRMs) employ reasoning to address complex tasks. Such explicit reasoning requires extended context lengths, resulting in substantially higher resource consumption. Prior work has shown that adversarially crafted inputs can trigger redundant reasoning processes, exposing LRMs to resource-exhaustion vulnerabilities. However, the reasoning process itself, especially its reflective component, has received limited attention, even though it can lead to over-reflection and consume excessive computing power. In this paper, we introduce Recursive Entropy to quantify the risk of resource consumption in reflection, thereby revealing the safety issues inherent in inference itself. Based on Recursive Entropy, we introduce RECUR, a resource exhaustion attack via Recursive Entropy guided Counterfactual Utilization and Reflection. It constructs counterfactual questions to verify the inherent flaws and risks of LRMs. Extensive experiments demonstrate that, under benign inference, recursive entropy exhibits a pronounced decreasing trend. RECUR disrupts this trend, increasing the output length by up to 11x and decreasing throughput by 90%. Our work provides a new perspective on robust reasoning.