SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chain-of-Thought (CoT) reasoning exhibits insufficient modeling capacity for complex topological reasoning tasks. Method: This paper proposes a novel reasoning framework based on dynamic topology optimization. It integrates topological modeling, reinforcement learning, and multi-task reward design, incorporating dynamic path optimization and automatic data segmentation. Contributions/Results: (1) A Topology Annotation Generation (TAG) system is introduced to automatically produce structured annotations of reasoning paths; (2) A reward-driven topology scaling mechanism enables adaptive adjustment of reasoning depth and breadth; (3) A Multi-task Topology Reward Model (M-TRM) jointly optimizes topology selection and answer generation within a single forward pass. On MATH and GSM8K, the method achieves up to a 10.02% accuracy gain, reduces response length by >5%, and lowers inference latency. Compared to a single-task TRM, M-TRM improves accuracy by 10% and topology ranking correlation by 9%.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) excel in reasoning but remain constrained by their Chain-of-Thought (CoT) approach, which struggles with complex tasks requiring more nuanced topological reasoning. We introduce SOLAR, Scalable Optimization of Large-scale Architecture for Reasoning, a framework that dynamically optimizes various reasoning topologies to enhance accuracy and efficiency. Our Topological Annotation Generation (TAG) system automates topological dataset creation and segmentation, improving post-training and evaluation. Additionally, we propose Topological-Scaling, a reward-driven framework that aligns training and inference scaling, equipping LLMs with adaptive, task-aware reasoning. SOLAR achieves substantial gains on MATH and GSM8K: +5% accuracy with Topological Tuning, +9% with Topological Reward, and +10.02% with Hybrid Scaling. It also reduces response length by over 5% for complex problems, lowering inference latency. To foster the reward system, we train a multi-task Topological Reward Model (M-TRM), which autonomously selects the best reasoning topology and answer in a single pass, eliminating the need for training and inference on multiple single-task TRMs (S-TRMs), thus reducing both training cost and inference latency. In addition, in terms of performance, M-TRM surpasses all S-TRMs, improving accuracy by +10% and rank correlation by +9%. To the best of our knowledge, SOLAR sets a new benchmark for scalable, high-precision LLM reasoning while introducing an automated annotation process and a dynamic reasoning topology competition mechanism.
Problem

Research questions and friction points this paper is trying to address.

Enhances LLM reasoning accuracy and efficiency
Automates topological dataset creation and segmentation
Reduces training cost and inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic optimization of reasoning topologies
Automated topological dataset creation and segmentation
Multi-task Topological Reward Model for efficiency
🔎 Similar Papers
No similar papers found.