🤖 AI Summary
This work investigates the impact of inference-time compute scaling on model robustness under adversarial settings where intermediate reasoning steps are observable to attackers. We find that, when reasoning chains are exposed, increasing inference resources systematically degrades robustness—a phenomenon we term the “inverse scaling law,” challenging the prevailing assumption that more computation inherently improves performance or security. Methodologically, we propose a budget-constrained inference scaling strategy and develop an evaluation framework integrating reasoning-chain extraction with advanced adversarial attacks. Experiments across open-source models confirm the universality of this effect, with deployment environment critically modulating the scaling behavior. Our core contribution is the first formalization and empirical demonstration of safety risks arising from inference-time scaling in open reasoning settings—providing critical insights and theoretical grounding for designing trustworthy reasoning architectures. (149 words)
📝 Abstract
Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.