Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

📅 2026-01-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of traditional logic reasoning data synthesis, which relies on expert-authored rules or fixed templates and supports only instance-level perturbations, thereby restricting task diversity and difficulty. To overcome this, the authors propose SSLogic, a framework that elevates evolvable units to the level of task-family specifications. Within a closed-loop pipeline of generation, verification, and refinement, LLM agents iteratively optimize executable generator–verifier pairs. The approach incorporates multi-strategy consensus and adversarial blind review mechanisms to ensure both the validity and challenge of synthesized tasks. Starting from 400 seed task families, SSLogic expands to 953 families and produces 21,389 verifiable instances, yielding significant performance gains on benchmarks such as Enigmata—improving logical and operational reasoning capabilities by 13.2% and 9.6%, respectively.

📝 Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) is bottlenecked by data: existing synthesis pipelines rely on expert-written code or fixed templates, confining growth to instance-level perturbations. We shift the evolvable unit from problem instances to task-family specifications. SSLogic is an agentic meta-synthesis framework in which LLM agents iteratively author and refine executable Generator-Validator pairs inside a closed Generate-Validate-Refine loop, producing families with new rules and difficulty gradients rather than parameter variations of old ones. A Multi-Gate Validation Protocol -- multi-strategy consensus plus Adversarial Blind Review, where independent agents solve each instance by writing and executing code -- filters ill-posed tasks before they enter training. Starting from 400 seed families, two evolution rounds yield 953 families and 21,389 verifiable instances. Three converging comparisons (step-matched, token-matched, and size-controlled on external Enigmata data) consistently show higher training utility of evolved data, with gains of SynLogic +5.2, AIME25 +3.0, and BBH +5.5 on Enigmata. Fine-grained KORBench evaluation reveals selective improvements in logic (+13.2%) and operation (+9.6%), linking structural evolution to downstream gains. Code: https://github.com/AdAstraAbyssoque/Scaling-the-Scaling-Logic

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning from Verifiable Rewards

logic reasoning

data synthesis

task-family generation

scalability bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Meta-Synthesis

Task-Family Evolution

Generator-Validator Pairs

Multi-Gate Validation Protocol

Verifiable Reward Learning

🔎 Similar Papers

No similar papers found.

Authors to Follow