🤖 AI Summary
Chain-of-Thought (CoT) reasoning exhibits insufficient modeling capacity for complex topological reasoning tasks. Method: This paper proposes a novel reasoning framework based on dynamic topology optimization. It integrates topological modeling, reinforcement learning, and multi-task reward design, incorporating dynamic path optimization and automatic data segmentation. Contributions/Results: (1) A Topology Annotation Generation (TAG) system is introduced to automatically produce structured annotations of reasoning paths; (2) A reward-driven topology scaling mechanism enables adaptive adjustment of reasoning depth and breadth; (3) A Multi-task Topology Reward Model (M-TRM) jointly optimizes topology selection and answer generation within a single forward pass. On MATH and GSM8K, the method achieves up to a 10.02% accuracy gain, reduces response length by >5%, and lowers inference latency. Compared to a single-task TRM, M-TRM improves accuracy by 10% and topology ranking correlation by 9%.
📝 Abstract
Large Language Models (LLMs) excel in reasoning but remain constrained by their Chain-of-Thought (CoT) approach, which struggles with complex tasks requiring more nuanced topological reasoning. We introduce SOLAR, Scalable Optimization of Large-scale Architecture for Reasoning, a framework that dynamically optimizes various reasoning topologies to enhance accuracy and efficiency. Our Topological Annotation Generation (TAG) system automates topological dataset creation and segmentation, improving post-training and evaluation. Additionally, we propose Topological-Scaling, a reward-driven framework that aligns training and inference scaling, equipping LLMs with adaptive, task-aware reasoning. SOLAR achieves substantial gains on MATH and GSM8K: +5% accuracy with Topological Tuning, +9% with Topological Reward, and +10.02% with Hybrid Scaling. It also reduces response length by over 5% for complex problems, lowering inference latency. To foster the reward system, we train a multi-task Topological Reward Model (M-TRM), which autonomously selects the best reasoning topology and answer in a single pass, eliminating the need for training and inference on multiple single-task TRMs (S-TRMs), thus reducing both training cost and inference latency. In addition, in terms of performance, M-TRM surpasses all S-TRMs, improving accuracy by +10% and rank correlation by +9%. To the best of our knowledge, SOLAR sets a new benchmark for scalable, high-precision LLM reasoning while introducing an automated annotation process and a dynamic reasoning topology competition mechanism.