LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks lack controllable evaluation of joint logical-numerical reasoning, hindering precise characterization of model deficiencies. To address this, we propose LogiNumSynth—the first natural language inference problem generation framework enabling full-dimensional controllable synthesis. It independently modulates logical depth (rule-chain length), world-modeling complexity, and numerical computation difficulty, while simultaneously generating stepwise reasoning traces and final answers. Its modular, rule-guided architecture permits fine-grained intervention, facilitating diagnostic assessment and targeted data augmentation. Experiments reveal that state-of-the-art large language models exhibit substantial performance gaps on these controlled tasks, underscoring both the diagnostic precision of LogiNumSynth and its utility as a high-quality, semantically grounded training data source.

Technology Category

Application Category

📝 Abstract
Joint logical-numerical reasoning remains a major challenge for language models, yet existing datasets rely on fixed rule sets and offer limited control over task complexity, constraining their generalizability for evaluation and training. We present LogiNumSynth, a flexible natural language problem synthesizer that synthesizes tasks requiring proficiency in joint logical reasoning (e.g., rule-based reasoning) and numerical reasoning (e.g., arithmetic computation). LogiNumSynth supports fine-grained control over reasoning world richness, logical reasoning depth, and the complexity of numerical computations, enabling flexible data synthesis across difficulty levels. We demonstrate three key contributions: (1) Synthesizer -- synthesizing fully controllable joint reasoning tasks over natural language; (2) Evaluation & Process Analysis -- evaluating both process accuracy and answer accuracy; (3) Targeted Training -- using synthesized data to enhance LLMs' reasoning performance. Experiments with multiple LLMs highlight persistent weaknesses in logical-numerical reasoning, showing that LogiNumSynth can serve as both a diagnostic tool and a source of targeted supervision for advancing integrated reasoning skills.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing joint logical-numerical reasoning problems for language models
Enabling flexible control over reasoning complexity and difficulty levels
Addressing weaknesses in integrated reasoning through diagnostic evaluation and training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes joint logical-numerical reasoning problems
Enables fine-grained control over task complexity
Provides diagnostic tool and targeted training data
🔎 Similar Papers
No similar papers found.