🤖 AI Summary
Current AI agents exhibit significant limitations in long-horizon planning and domain-knowledge integration, hindering their effectiveness in real-world complex tasks. To address this, we propose the SOP-Driven Agent framework, which formalizes natural-language standard operating procedures (SOPs) into traversable decision graphs and leverages large language models (LLMs) for stepwise reasoning and execution, supported by a multi-domain adaptation architecture. Our key contributions are: (1) the first SOP-guided decision graph modeling paradigm; and (2) the Grounded Customer Service Benchmark—the first evaluation benchmark grounded in real service scenarios with explicit domain knowledge alignment. Experiments demonstrate that our framework substantially outperforms general-purpose agents across diverse tasks—including decision-making, search, code generation, data cleaning, and customer service—while matching the performance of custom-built domain-specific agents.
📝 Abstract
Despite significant advancements in general-purpose AI agents, several challenges still hinder their practical application in real-world scenarios. First, the limited planning capabilities of Large Language Models (LLM) restrict AI agents from effectively solving complex tasks that require long-horizon planning. Second, general-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise. In this paper, we introduce the Standard Operational Procedure-guided Agent (SOP-agent), a novel framework for constructing domain-specific agents through pseudocode-style Standard Operational Procedures (SOPs) written in natural language. Formally, we represent a SOP as a decision graph, which is traversed to guide the agent in completing tasks specified by the SOP. We conduct extensive experiments across tasks in multiple domains, including decision-making, search and reasoning, code generation, data cleaning, and grounded customer service. The SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks and comparable to domain-specific agent systems. Additionally, we introduce the Grounded Customer Service Benchmark, the first benchmark designed to evaluate the grounded decision-making capabilities of AI agents in customer service scenarios based on SOPs.