A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large reasoning models exhibit a significant gap between their theoretical multi-path reasoning capability and practical single-inference performance. To address this, we propose A2R, an asymmetric two-stage parallel inference framework: an Explorer model generates diverse solution paths in parallel, while a Consolidator model performs deep, integrated reasoning over these paths. A2R introduces a novel plug-and-play “small-explorer–large-consolidator” architecture that jointly optimizes computational efficiency and reasoning quality. Its design—featuring multi-path sampling, reference ensemble, and decoupled two-stage execution—enables flexible model composition and scalable extension. On complex reasoning tasks, Qwen3-8B augmented with A2R achieves a 75% performance gain. Moreover, the lightweight variant A2R-Efficient attains superior average performance to Qwen3-32B at nearly 30% lower inference cost, substantially narrowing the capability–performance gap.

Technology Category

Application Category

📝 Abstract

Recent Large Reasoning Models have achieved significant improvements in complex task-solving capabilities by allocating more computation at the inference stage with a "thinking longer" paradigm. Even as the foundational reasoning capabilities of models advance rapidly, the persistent gap between a model's performance in a single attempt and its latent potential, often revealed only across multiple solution paths, starkly highlights the disparity between its realized and inherent capabilities. To address this, we present A2R, an Asymmetric Two-Stage Reasoning framework designed to explicitly bridge the gap between a model's potential and its actual performance. In this framework, an "explorer" model first generates potential solutions in parallel through repeated sampling. Subsequently,a "synthesizer" model integrates these references for a more refined, second stage of reasoning. This two-stage process allows computation to be scaled orthogonally to existing sequential methods. Our work makes two key innovations: First, we present A2R as a plug-and-play parallel reasoning framework that explicitly enhances a model's capabilities on complex questions. For example, using our framework, the Qwen3-8B-distill model achieves a 75% performance improvement compared to its self-consistency baseline. Second, through a systematic analysis of the explorer and synthesizer roles, we identify an effective asymmetric scaling paradigm. This insight leads to A2R-Efficient, a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer. This configuration surpasses the average performance of a monolithic Qwen3-32B model at a nearly 30% lower cost. Collectively, these results show that A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Bridging the gap between model potential and actual performance

Enhancing reasoning capabilities through parallel two-stage framework

Improving efficiency with asymmetric scaling in reasoning models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage reasoning with explorer and synthesizer models

Parallel solution generation through repeated sampling

Asymmetric scaling with small-to-big model configuration

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting