🤖 AI Summary
This work addresses the inefficiency of existing LLM agent workflow optimization methods, which rely on coarse-grained end-to-end feedback and struggle to pinpoint problematic modules. To overcome this limitation, we propose JudgeFlow, a framework that constructs workflows from reusable and configurable logical blocks and introduces dedicated Judge modules to analyze execution traces—particularly failure cases. These Judges assign rank-based responsibility scores to individual blocks, enabling an LLM optimizer to focus on the most critical weaknesses for targeted refinement. Our approach achieves, for the first time, block-level fine-grained diagnosis and precise optimization, significantly improving performance, sample efficiency, and workflow interpretability on mathematical reasoning and code generation benchmarks, thereby establishing a scalable foundation for automated optimization of complex agent pipelines.
📝 Abstract
Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to refine, often resulting in inefficient or low-impact modifications. To address these limitations, we propose JudgeFlow, an Evaluation-Judge-Optimization-Update pipeline. We incorporate reusable, configurable logic blocks into agentic workflows to capture fundamental forms of logic. On top of this abstraction, we design a dedicated Judge module that inspects execution traces particularly failed runs and assigns rank-based responsibility scores to problematic blocks. These fine-grained diagnostic signals are then leveraged by an LLM-based optimizer, which focuses modifications on the most problematic block in the workflow. Our approach improves sample efficiency, enhances interpretability through block-level diagnostics, and provides a scalable foundation for automating increasingly complex agentic workflows. We evaluate JudgeFlow on mathematical reasoning and code generation benchmarks, where JudgeFlow achieves superior performance and efficiency compared to existing methods.