🤖 AI Summary
This work addresses the limitations of traditional logic reasoning data synthesis, which relies on expert-authored rules or fixed templates and supports only instance-level perturbations, thereby restricting task diversity and difficulty. To overcome this, the authors propose SSLogic, a framework that elevates evolvable units to the level of task-family specifications. Within a closed-loop pipeline of generation, verification, and refinement, LLM agents iteratively optimize executable generator–verifier pairs. The approach incorporates multi-strategy consensus and adversarial blind review mechanisms to ensure both the validity and challenge of synthesized tasks. Starting from 400 seed task families, SSLogic expands to 953 families and produces 21,389 verifiable instances, yielding significant performance gains on benchmarks such as Enigmata—improving logical and operational reasoning capabilities by 13.2% and 9.6%, respectively.
📝 Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) is bottlenecked by data: existing synthesis pipelines rely on expert-written code or fixed templates, confining growth to instance-level perturbations. We shift the evolvable unit from problem instances to task-family specifications. SSLogic is an agentic meta-synthesis framework in which LLM agents iteratively author and refine executable Generator-Validator pairs inside a closed Generate-Validate-Refine loop, producing families with new rules and difficulty gradients rather than parameter variations of old ones. A Multi-Gate Validation Protocol -- multi-strategy consensus plus Adversarial Blind Review, where independent agents solve each instance by writing and executing code -- filters ill-posed tasks before they enter training. Starting from 400 seed families, two evolution rounds yield 953 families and 21,389 verifiable instances. Three converging comparisons (step-matched, token-matched, and size-controlled on external Enigmata data) consistently show higher training utility of evolved data, with gains of SynLogic +5.2, AIME25 +3.0, and BBH +5.5 on Enigmata. Fine-grained KORBench evaluation reveals selective improvements in logic (+13.2%) and operation (+9.6%), linking structural evolution to downstream gains. Code: https://github.com/AdAstraAbyssoque/Scaling-the-Scaling-Logic