Learning Adaptive Parallel Reasoning with Language Models

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Contemporary large language models (LLMs) face a dual bottleneck in reasoning: serial chain-of-thought inference suffers from high latency and rapid context exhaustion, while parallel approaches (e.g., self-consistency) exhibit weak inter-thread coordination, substantial redundancy, and marginal performance gains. This paper introduces Adaptive Parallel Reasoning (APR), the first end-to-end framework for dynamic scheduling between serial and multi-threaded parallel computation. Its core innovations are: (1) an adaptive multi-threaded inference mechanism grounded in spawn/join primitives; and (2) an end-to-end reinforcement learning policy—without predefined structural constraints—that jointly optimizes parent and child thread decisions. Evaluated on the Countdown task, APR achieves substantial improvements: 83.4% accuracy under 4k-context (a +23.4% gain), 80.1% under 20k total tokens (+13.5%), and 75.2% under a 5000ms latency budget (+17.9%).

Technology Category

Application Category

📝 Abstract
Scaling inference-time computation has substantially improved the reasoning capabilities of language models. However, existing methods have significant limitations: serialized chain-of-thought approaches generate overly long outputs, leading to increased latency and exhausted context windows, while parallel methods such as self-consistency suffer from insufficient coordination, resulting in redundant computations and limited performance gains. To address these shortcomings, we propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations. A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures. Experiments on the Countdown reasoning task demonstrate significant benefits of APR: (1) higher performance within the same context window (83.4% vs. 60.0% at 4k context); (2) superior scalability with increased computation (80.1% vs. 66.6% at 20k total tokens); (3) improved accuracy at equivalent latency (75.2% vs. 57.3% at approximately 5,000ms). APR represents a step towards enabling language models to autonomously optimize their reasoning processes through adaptive allocation of computation.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of serialized and parallel reasoning methods in language models
Enabling adaptive multi-threaded inference with spawn() and join() operations
Optimizing parent and child inference threads via reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Parallel Reasoning (APR) framework
End-to-end reinforcement learning strategy
Spawn() and join() operations for multi-threading
🔎 Similar Papers
No similar papers found.