MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

Large reasoning models (LRMs) suffer from “over-reasoning”—excessively relying on System 2–style deep reasoning even for simple tasks, resulting in inefficiency; moreover, their static pretraining data hinders adaptation to dynamic environments. This paper proposes MARS, a Multi-Agent Reasoning System featuring a novel dual-system collaborative architecture: System 1 enables intuitive, rapid decision-making, while System 2 selectively invokes external tools (e.g., search, computation) for targeted deep reasoning. A multi-agent reinforcement learning framework—extended Group Relative Policy Optimization (GRPO)—dynamically allocates reasoning responsibilities and jointly optimizes both systems. To enhance training stability, we introduce bin-packing–based task scheduling and sample balancing. On the Humanity’s Last Exam benchmark, MARS achieves a 3.86% accuracy gain; across seven knowledge-intensive tasks, average reasoning efficiency improves by 8.9%, significantly strengthening complex reasoning and real-time information integration capabilities.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) often exhibit a tendency for overanalysis in simple tasks, where the models excessively utilize System 2-type, deliberate reasoning, leading to inefficient token generation. Furthermore, these models face challenges in adapting their reasoning capabilities to rapidly changing environments due to the static nature of their pretraining data. To address these issues, advancing Large Language Models (LLMs) for complex reasoning tasks requires innovative approaches that bridge intuitive and deliberate cognitive processes, akin to human cognition's dual-system dynamic. This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning within LLMs. MARS strategically integrates multiple external tools, such as Google Search, Google Scholar, and Python Interpreter, to access up-to-date information and execute complex computations, while creating a specialized division of labor where System 1 efficiently processes and summarizes high-volume external information, providing distilled insights that expand System 2's reasoning context without overwhelming its capacity. Furthermore, we propose a multi-agent reinforcement learning framework extending Group Relative Policy Optimization to simultaneously optimize both systems with multi-turn tool interactions, bin-packing optimization, and sample balancing strategies that enhance collaborative efficiency. Extensive experiments demonstrate MARS achieves substantial improvements of 3.86% on the challenging Humanity's Last Exam (HLE) benchmark and an average gain of 8.9% across 7 knowledge-intensive tasks, validating the effectiveness of our dual-system paradigm for complex reasoning in dynamic information environments.

Problem

Research questions and friction points this paper is trying to address.

Optimizing dual-system reasoning to prevent overanalysis in simple tasks

Adapting LLMs to dynamic environments beyond static pretraining data

Integrating intuitive and deliberate reasoning with multi-agent tool interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system integrates intuitive and deliberate reasoning

Reinforcement learning optimizes dual-system collaboration with tools

External tools provide updated information and computational capabilities

🔎 Similar Papers

No similar papers found.