🤖 AI Summary
Industrial automation demands flexible control policies capable of adapting to dynamic tasks and environments, yet current LLM-based agents lack standardized planning benchmarks. To address this, we introduce the first LLM agent planning and control benchmark tailored for industrial automation, built upon an executable Blocksworld simulation environment and encompassing five progressively complex task categories. We innovatively propose the Model Context Protocol (MCP) as a unified tool interface, enabling plug-and-play integration and fair evaluation across heterogeneous agent architectures. The benchmark features modular design, automated evaluation pipelines, and a comprehensive quantitative metric suite; its feasibility has been validated using single-agent architectures. As an open-source platform, it fills a critical gap in systematic, reproducible benchmarking for LLM agent planning—establishing a foundation for rigorous, scalable research in intelligent control.
📝 Abstract
Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for such adaptive planning and execution but lack standardized benchmarks for systematic comparison. We introduce a benchmark with an executable simulation environment representing the Blocksworld problem providing five complexity categories. By integrating the Model Context Protocol (MCP) as a standardized tool interface, diverse agent architectures can be connected to and evaluated against the benchmark without implementation-specific modifications. A single-agent implementation demonstrates the benchmark's applicability, establishing quantitative metrics for comparison of LLM-based planning and execution approaches.