🤖 AI Summary
Existing large language model (LLM)-based multi-agent systems suffer from low flexibility, poor adaptability, and limited scalability in complex tasks—rooted in discrete optimization paradigms and constrained representation capacity. To address this, we propose Score-DPO, the first framework to integrate quantitative feedback into Direct Preference Optimization (DPO), enabling end-to-end gradient-based updates within a continuous parameter space. Our approach introduces parametric modeling of multi-task workflows and lightweight agent coordination scheduling, thereby overcoming the limitations of traditional discrete search. Evaluated across six QA, programming, and mathematical reasoning benchmarks, Score-DPO achieves an average performance gain of 8.2%. Notably, it significantly reduces inference costs for smaller models while outperforming larger ones—demonstrating efficiency, adaptability, and scalability. This work establishes a novel, unified paradigm for optimizing multi-agent workflows.
📝 Abstract
Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow