JudgeFlow: Agentic Workflow Optimization via Block Judge

📅 2026-01-12

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the inefficiency of existing LLM agent workflow optimization methods, which rely on coarse-grained end-to-end feedback and struggle to pinpoint problematic modules. To overcome this limitation, we propose JudgeFlow, a framework that constructs workflows from reusable and configurable logical blocks and introduces dedicated Judge modules to analyze execution traces—particularly failure cases. These Judges assign rank-based responsibility scores to individual blocks, enabling an LLM optimizer to focus on the most critical weaknesses for targeted refinement. Our approach achieves, for the first time, block-level fine-grained diagnosis and precise optimization, significantly improving performance, sample efficiency, and workflow interpretability on mathematical reasoning and code generation benchmarks, thereby establishing a scalable foundation for automated optimization of complex agent pipelines.

Technology Category

Application Category

📝 Abstract

Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to refine, often resulting in inefficient or low-impact modifications. To address these limitations, we propose JudgeFlow, an Evaluation-Judge-Optimization-Update pipeline. We incorporate reusable, configurable logic blocks into agentic workflows to capture fundamental forms of logic. On top of this abstraction, we design a dedicated Judge module that inspects execution traces particularly failed runs and assigns rank-based responsibility scores to problematic blocks. These fine-grained diagnostic signals are then leveraged by an LLM-based optimizer, which focuses modifications on the most problematic block in the workflow. Our approach improves sample efficiency, enhances interpretability through block-level diagnostics, and provides a scalable foundation for automating increasingly complex agentic workflows. We evaluate JudgeFlow on mathematical reasoning and code generation benchmarks, where JudgeFlow achieves superior performance and efficiency compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

agentic workflow

LLM optimization

fine-grained evaluation

workflow refinement

execution trace analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic workflow

block-level diagnosis

Judge module