๐ค AI Summary
To address the challenge of dynamically constructing customized multi-agent systems for each user query, this paper proposes FlowReasonerโa query-level meta-agent framework. FlowReasoner end-to-end generates a dedicated multi-agent workflow tailored to a single query, leveraging deep reasoning for architectural design and jointly optimizing via external execution feedback and PPO-based reinforcement learning. Its contributions are threefold: (1) it introduces the first query-granular meta-agent paradigm; (2) it designs a differentiable, multi-objective reward function that jointly balances performance, complexity, and efficiency; and (3) it initializes the policy with knowledge distillation from DeepSeek R1 to enhance convergence quality. Evaluated on engineering and competitive programming code benchmarks, FlowReasoner achieves an average accuracy gain of 10.52% over o1-mini and significantly outperforms existing baselines.
๐ Abstract
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.