🤖 AI Summary
Existing AI approaches to problem solving either rely on costly model-centric strategies or employ fragmented prompting techniques, lacking a unified, interpretable, and efficient algorithmic reasoning framework. This work proposes MAS-Algorithm, the first systematic application of multi-agent systems to algorithmic programming problem solving. Inspired by competitive programming, it constructs a modular and collaborative workflow that enables structured reasoning and seamless integration with external tools. The method demonstrates strong scalability and generality, achieving average pass rate improvements of 6.48% on a newly curated benchmark and 4.72% on LiveCodeBench-Pro. Notably, individual agents contribute performance gains as high as 27.7%, significantly outperforming baseline approaches such as parameter-efficient fine-tuning.
📝 Abstract
Algorithmic problem solving serves as a rigorous testbed for evaluating structured reasoning in AI coding systems, as it directly reflects a model's ability to perform structured reasoning in complex scenarios.Existing approaches predominantly rely on model-centric strategies, such as architectural modifications and data scaling, which are costly and offer limited interpretability. Alternative methods leveraging external tools or prompting techniques (e.g., chain-of-thought) are often fragmented and lack a unified framework. In this paper, we propose MAS-Algorithm, a systematic multi-agent workflow for algorithmic problem solving inspired by the practices of competitive programmers and algorithm engineers. Our framework decomposes the end-to-end solving process into modular stages, enabling structured reasoning, tool integration, and flexible coordination among agents. The design emphasizes both rigor and extensibility, allowing it to generalize across diverse problem types.Experimental results on a self-constructed benchmark demonstrate consistent improvements across multiple Qwen series models, achieving an average gain of 6.48% in acceptance rate. In contrast, parameter-efficient fine-tuning on the same data yields only a marginal improvement of 0.89%. We further observe a 4.72% gain on LiveCodeBench-Pro, along with consistent improvements across additional accuracy and efficiency metrics.Beyond performance gains, we conduct comprehensive analyses to better understand the reasoning process within the workflow, including error patterns and cross-scenario behaviors. We further perform customized replacement and ablation studies to explore the upper bound of the framework, showing that individual agents can contribute improvements of up to 27.7%. These results highlight the strong potential of MAS-Algorithm for advancing AI-driven algorithmic reasoning.