ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing large language model (LLM)-based multi-agent systems suffer from low flexibility, poor adaptability, and limited scalability in complex tasks—rooted in discrete optimization paradigms and constrained representation capacity. To address this, we propose Score-DPO, the first framework to integrate quantitative feedback into Direct Preference Optimization (DPO), enabling end-to-end gradient-based updates within a continuous parameter space. Our approach introduces parametric modeling of multi-task workflows and lightweight agent coordination scheduling, thereby overcoming the limitations of traditional discrete search. Evaluated across six QA, programming, and mathematical reasoning benchmarks, Score-DPO achieves an average performance gain of 8.2%. Notably, it significantly reduces inference costs for smaller models while outperforming larger ones—demonstrating efficiency, adaptability, and scalability. This work establishes a novel, unified paradigm for optimizing multi-agent workflows.

Technology Category

Application Category

📝 Abstract

Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

Problem

Research questions and friction points this paper is trying to address.

Optimize LLM agent workflows efficiently

Overcome inflexibility in existing methods

Enhance scalability and adaptability

Innovation

Methods, ideas, or system contributions that make the work stand out.

ScoreFlow framework

Score-DPO method

gradient-based optimization

🔎 Similar Papers

No similar papers found.