Trading Inference-Time Compute for Adversarial Robustness

📅 2025-01-31

📈 Citations: 1

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Adversarial robustness of large language models (LLMs) remains challenging, especially without costly adversarial training. Method: This work investigates how scaling computational resources during inference—without modifying model parameters or employing adversarial training—enhances robustness in models such as OpenAI o1-preview and o1-mini. We propose a training-free robustness enhancement paradigm based on adaptive inference scheduling and establish a comprehensive, standardized evaluation framework incorporating novel inference-directed attacks. Contribution/Results: We are the first to systematically demonstrate that “inference compute” serves as a universal robustness lever: attack success rates decline significantly—and asymptotically approach zero—with increased inference time or compute budget. We quantitatively characterize the saturation regime and failure boundary of the compute–robustness trade-off and analyze their underlying causes. Experiments across multiple strong adversarial attacks confirm substantial robustness gains, validating inference-time compute scaling as a lightweight, general-purpose, training-free defense with strong theoretical and practical promise.

Technology Category

Application Category

📝 Abstract

We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial attacks. We find that across a variety of attacks, increased inference-time compute leads to improved robustness. In many cases (with important exceptions), the fraction of model samples where the attack succeeds tends to zero as the amount of test-time compute grows. We perform no adversarial training for the tasks we study, and we increase inference-time compute by simply allowing the models to spend more compute on reasoning, independently of the form of attack. Our results suggest that inference-time compute has the potential to improve adversarial robustness for Large Language Models. We also explore new attacks directed at reasoning models, as well as settings where inference-time compute does not improve reliability, and speculate on the reasons for these as well as ways to address them.

Problem

Research questions and friction points this paper is trying to address.

AI Model Robustness

Attack Resistance

Enhancement Techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced Resilience

Adversarial Attacks

Computational Power

🔎 Similar Papers

Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness