Process-Centric Analysis of Agentic Software Systems

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Current agentic system evaluations overemphasize final outcomes while neglecting processual behaviors—such as reasoning, planning, and strategy evolution—limiting insight into operational quality. Method: We propose a process-centric analytical framework that models execution trajectories as temporal-semantic graphs using Graphectory, enabling fine-grained, result-agnostic, and automated workflow assessment. Leveraging 4,000 programming agent trajectories from SWE-agent and OpenHands, we integrate LLM-based analysis, trajectory capture, and graph modeling. Results: Our analysis reveals systematic strategy differences and efficiency bottlenecks: stronger LLMs or improved prompts induce more complex exploration and verification; successful cases follow a clear “locate–fix–verify” pattern, whereas failures exhibit disorganization or loops; even successful tasks frequently involve unnecessarily long paths. This work establishes the first interpretable, quantifiable evaluation framework for agent process quality.

Technology Category

Application Category

📝 Abstract

Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution (i.e., trajectories) is inherently stochastic and adaptive to the problem they are solving. Evaluation of such systems is often outcome-centric, judging their performance based on success or failure at the final step. This narrow focus overlooks detailed insights about such systems, failing to explain how agents reason, plan, act, or change their strategies over time. Inspired by the structured representation of conventional software systems as graphs, we introduce Graphectory to systematically encode the temporal and semantic relations in such software systems. Graphectory facilitates the design of process-centric metrics and analyses to assess the quality of agentic workflows independent of final success. Using Graphectory, we analyze 4000 trajectories of two dominant agentic programming workflows, namely SWE-agent and OpenHands, with a combination of four backbone Large Language Models (LLMs), attempting to resolve SWE-bench Verified issues. Our fully automated analyses reveal that: (1) agents using richer prompts or stronger LLMs exhibit more complex Graphectory, reflecting deeper exploration, broader context gathering, and more thorough validation before patch submission; (2) agents'problem-solving strategies vary with both problem difficulty and the underlying LLM -- for resolved issues, the strategies often follow coherent localization-patching-validation steps, while unresolved ones exhibit chaotic, repetitive, or backtracking behaviors; (3) even when successful, agentic programming systems often display inefficient processes, leading to unnecessarily prolonged trajectories.

Problem

Research questions and friction points this paper is trying to address.

Analyzes agentic software systems' stochastic and adaptive execution trajectories.

Introduces Graphectory to encode temporal and semantic relations for process-centric evaluation.

Evaluates agentic workflows' quality beyond final outcomes using automated metrics and analyses.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Graphectory to encode temporal and semantic relations

Enables process-centric metrics independent of final outcomes

Analyzes agent workflows using automated Graphectory-based analysis

🔎 Similar Papers

No similar papers found.