🤖 AI Summary
Evaluating LLM-driven agent workflows is costly and inefficient due to repeated LLM invocations. To address this, we propose the first Graph Neural Network (GNN)-based performance prediction framework: workflows are modeled as computational graphs, and end-to-end learning enables efficient, accurate performance estimation—dramatically reducing LLM calls. Our key contributions are: (1) the first systematic application of GNNs to agent workflow performance prediction; (2) FLORA-Bench, the first unified benchmark covering diverse tasks and architectural variants; and (3) empirical validation showing our lightweight model achieves higher prediction accuracy than baselines while reducing evaluation overhead by several orders of magnitude. All code, models, and data are publicly released.
📝 Abstract
Agentic workflows invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. However, optimizing such workflows is costly and inefficient in real-world applications due to extensive invocations of LLMs. To fill this gap, this position paper formulates agentic workflows as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic workflow performances, avoiding repeated LLM invocations for evaluation. To empirically ground this position, we construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances. With extensive experiments, we arrive at the following conclusion: GNNs are simple yet effective predictors. This conclusion supports new applications of GNNs and a novel direction towards automating agentic workflow optimization. All codes, models, and data are available at https://github.com/youngsoul0731/Flora-Bench.