Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge in large language model (LLM) reasoning of reconciling parallel computation with sequential, deep thinking. To this end, we propose Recursive Self-Aggregation (RSA), a test-time inference method that generates multiple candidate reasoning chains in parallel, then iteratively refines them through step-wise sub-group aggregation, multi-step refinement, and population-level evolution. RSA innovatively integrates evolutionary principles with aggregation-aware reinforcement learning, enabling self-bootstrapped optimization from locally correct reasoning traces—without introducing additional trainable parameters. Empirical results demonstrate substantial improvements in reasoning quality: on complex benchmarks such as AIME-25, Qwen3-4B-Instruct-2507 equipped with RSA matches or exceeds the performance of larger models (e.g., Qwen3-8B), consistently outperforming both purely parallel and purely sequential baselines.

Technology Category

Application Category

📝 Abstract

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning through recursive aggregation of candidate solutions

Combining parallel and sequential scaling to improve inference-time computation

Boosting model performance across diverse tasks with aggregation-aware training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive Self-Aggregation combines parallel and sequential scaling

Aggregates subsets of reasoning chains to improve solutions

Uses aggregation-aware reinforcement learning for training enhancement

🔎 Similar Papers

No similar papers found.