Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing test-time inference methods struggle to enhance adversarial out-of-distribution (OOD) robustness when attackers access gradients or multimodal inputs. Method: We propose the “Robustness from Inference Computation for Hardening” (RICH) hypothesis, positing that the defensive efficacy of inference computation—e.g., adversarial pretraining, prompt engineering, or compositional generalization enhancement—depends critically on the extent to which training data covers adversarial components. Specifically, increased inference overhead yields significant defense gains only when the base model already possesses non-negligible intrinsic robustness. Results: Experiments on vision-language models empirically validate a synergistic, self-reinforcing mechanism: stronger training-time robustness amplifies the defensive impact of test-time computation, forming a “rich-get-richer” positive feedback loop. This work provides the first systematic characterization of the boundary conditions and operational principles governing training–test co-defense, clarifying when and why inference-level interventions succeed or fail.

Technology Category

Application Category

📝 Abstract
Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba et al. (2025) make progress on this problem at test time, showing LLM reasoning improves satisfaction of model specifications designed to thwart attacks, resulting in a correlation between reasoning effort and robustness to jailbreaks. However, this benefit of test compute fades when attackers are given access to gradients or multimodal inputs. We address this gap, clarifying that inference-compute offers benefits even in such cases. Our approach argues that compositional generalization, through which OOD data is understandable via its in-distribution (ID) components, enables adherence to defensive specifications on adversarially OOD inputs. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the attacked data's components. We empirically support this hypothesis across vision language model and attack types, finding robustness gains from test-time compute if specification following on OOD data is unlocked by compositional generalization, while RL finetuning and protracted reasoning are not critical. For example, increasing emphasis on defensive specifications via prompting lowers the success rate of gradient-based multimodal attacks on VLMs robustified by adversarial pretraining, but this same intervention provides no such benefit to not-robustified models. This correlation of inference-compute's robustness benefit with base model robustness is the rich-get-richer dynamic of the RICH: attacked data components are more ID for robustified models, aiding compositional generalization to OOD data. Accordingly, we advise layering train-time and test-time defenses to obtain their synergistic benefit.
Problem

Research questions and friction points this paper is trying to address.

Enhancing model robustness against adversarial out-of-distribution data attacks
Leveraging inference compute to improve defense despite gradient or multimodal threats
Establishing compositional generalization as key for test-time compute robustness gains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging compositional generalization for OOD robustness
Using inference compute to enhance defensive specifications adherence
Synergizing train-time and test-time defenses for attack resilience
🔎 Similar Papers
No similar papers found.