🤖 AI Summary
Large language models exhibit fragile reasoning capabilities when confronted with non-ideal contexts containing redundant, irrelevant, or false information. This work proposes an adversarial self-play reinforcement learning framework in which a single model with shared parameters simultaneously generates distracting contexts and performs the target reasoning task. By modeling context perturbations as learnable adversarial signals, the approach establishes a dynamically co-evolving curriculum that compels the model to move beyond superficial pattern matching and develop deeper, more robust reasoning mechanisms. Evaluated across seven mathematical reasoning benchmarks, models ranging from 4B to 30B parameters achieve average score improvements of 7.2–10.2 points. Moreover, the generated distractors effectively degrade the accuracy of closed-source models such as GPT and Gemini by 4–5 points.
📝 Abstract
We present Seirênes, a self-play RL framework that transforms contextual interference from a failure mode of LLM reasoning into an internal training signal for co-evolving more resilient reasoners. While RL with verifiable rewards has significantly advanced reasoning capabilities, models can still exhibit fragility when encountering non-idealized contexts: scenarios characterized by superfluous information, tangential instructions, or incidental correlations that differ from the clean distributions typical of standard benchmarks. Seirênes harnesses this vulnerability through a parameter-shared and adversarial self-play loop. Within this framework, a single model is trained to both construct plausible yet distracting contexts that expose its own reasoning blind spots, and solve problems by discerning the essential task from these perturbations to recover the core underlying logic. By pitting these competing objectives against each other, Seirênes compels the model to move beyond superficial pattern matching and anchors its capabilities in robust underlying reasoning. This continuous interaction sustains an informative co-evolutionary curriculum as the model improves. Across seven mathematical reasoning benchmarks and model scales from 4B to 30B, Seirênes achieves average gains of +10.2, +9.1, and +7.2 points. Besides, distracting contexts produced by the 4B Seirênes model reduce the accuracy of top-tier closed-source models (GPT and Gemini) by roughly 4--5 points, revealing Seirênes' general ability to uncover reasoning models' blind spots.