SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

πŸ“… 2026-02-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models struggle to reliably orchestrate long-horizon, multi-step tasks in real-world supply chain scenarios, particularly when standard operating procedures (SOPs) are unavailable. To address this gap, this work introduces SupChain-Bench, the first benchmark for evaluating long-horizon tool-use capabilities in authentic supply chain settings, and proposes SupChain-ReActβ€”a novel framework that generates executable workflows through autonomous program synthesis without relying on predefined SOPs. Experimental results demonstrate that existing models exhibit significant deficiencies in execution reliability, whereas SupChain-ReAct achieves state-of-the-art performance in both tool invocation success rate and operational stability.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we introduce SupChain-Bench, a unified real-world benchmark that assesses both supply chain domain knowledge and long-horizon tool-based orchestration grounded in standard operating procedures (SOPs). Our experiments reveal substantial gaps in execution reliability across models. We further propose SupChain-ReAct, an SOP-free framework that autonomously synthesizes executable procedures for tool use, achieving the strongest and most consistent tool-calling performance. Our work establishes a principled benchmark for studying reliable long-horizon orchestration in real-world operational settings and highlights significant room for improvement in LLM-based supply chain agents.
Problem

Research questions and friction points this paper is trying to address.

supply chain management
large language models
long-horizon orchestration
tool-based decision making
standard operating procedures
Innovation

Methods, ideas, or system contributions that make the work stand out.

SupChain-Bench
long-horizon orchestration
tool-based reasoning
SOP-free framework
supply chain management