Failure-Driven Workflow Refinement

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM workflow optimization methods rely solely on binary success/failure signals, discarding fine-grained failure-mode information and hindering accurate modeling and optimization of the failure distribution. This work reframes the problem from a distributional perspective—introducing “Expected Failure Quality” (EFQ) as the optimization objective—to shift paradigms from scalar scoring to geometric reshaping of the failure distribution. To realize this, we construct a Failure Signature Space (FSS), estimate failure density via an adversarial counterexample pool, and propose the CE-Graph framework: it performs operator-constrained graph editing in high-density failure regions using a Propose-and-Verify mechanism. Evaluated on mathematical reasoning, code generation, and question-answering benchmarks, our method significantly outperforms strong baselines at lower computational cost, empirically validating that systematic distribution-level optimization enhances workflow robustness.

Technology Category

Application Category

📝 Abstract
Optimizing LLM-based workflows is typically formulated as a global search, where candidate workflows are evaluated based on a scalar metric. This paradigm, however, suffers from a critical flaw: information collapse. By reducing rich, multi-step execution traces to simple success/failure signals, existing methods are rendered blind to the underlying structure of failures, fundamentally preventing them from modeling the workflow's failure distribution. We reconceptualize this challenge as a distributional problem. We propose a new paradigm where the optimization goal is not to maximize a scalar score, but to directly minimize a workflow's Expected Failure Mass, i.e., the integral of its failure probability density function defined over a high-dimensional Failure Signature Space (FSS). This distributional lens allows us to move from inefficient, zero-order optimization to a principled, gradient-like descent on the failure landscape itself. We introduce CE-Graph, a framework that operationalizes this paradigm through a novel, failure-driven refinement process. CE-Graph approximates the failure distribution from a pool of counterexamples, identifies its densest regions as recurring failure modes, and applies targeted, operator-constrained graph edits via a Propose-and-Verify mechanism to greedily reduce the failure mass. On math, code, and QA benchmarks, our CE-Graph achieves higher robustness at a significantly lower cost than strong baselines. This suggests that a system's reliability emerges not from avoiding failures, but from systematically learning and reshaping the geometric structure of its failure distributions.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM workflows by minimizing failure distribution mass
Addressing information collapse in workflow execution traces
Systematically refining workflows through failure-driven graph edits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes failure mass via distributional optimization
Uses counterexamples to identify recurring failure modes
Applies targeted graph edits through Propose-and-Verify mechanism
🔎 Similar Papers
No similar papers found.
J
Jusheng Zhang
Sun Yat-sen University
K
Kaitong Cai
Sun Yat-sen University
Q
Qinglin Zeng
Sun Yat-sen University
N
Ningyuan Liu
Sun Yat-sen University
S
Stephen Fan
Ziliang Chen
Ziliang Chen
AP, Pengcheng Lab
Machine learningFoundation ModelsMultimodal Embodied Intelligence
K
Keze Wang
Sun Yat-sen University, X-Era AI Lab