Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial automation demands flexible control policies capable of adapting to dynamic tasks and environments, yet current LLM-based agents lack standardized planning benchmarks. To address this, we introduce the first LLM agent planning and control benchmark tailored for industrial automation, built upon an executable Blocksworld simulation environment and encompassing five progressively complex task categories. We innovatively propose the Model Context Protocol (MCP) as a unified tool interface, enabling plug-and-play integration and fair evaluation across heterogeneous agent architectures. The benchmark features modular design, automated evaluation pipelines, and a comprehensive quantitative metric suite; its feasibility has been validated using single-agent architectures. As an open-source platform, it fills a critical gap in systematic, reproducible benchmarking for LLM agent planning—establishing a foundation for rigorous, scalable research in intelligent control.

Technology Category

Application Category

📝 Abstract
Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for such adaptive planning and execution but lack standardized benchmarks for systematic comparison. We introduce a benchmark with an executable simulation environment representing the Blocksworld problem providing five complexity categories. By integrating the Model Context Protocol (MCP) as a standardized tool interface, diverse agent architectures can be connected to and evaluated against the benchmark without implementation-specific modifications. A single-agent implementation demonstrates the benchmark's applicability, establishing quantitative metrics for comparison of LLM-based planning and execution approaches.
Problem

Research questions and friction points this paper is trying to address.

Standardizes benchmarks for LLM agents in adaptive planning
Introduces Blocksworld simulation with five complexity levels
Enables agent evaluation via Model Context Protocol interface
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Blocksworld benchmark with five complexity categories
Integrates Model Context Protocol for standardized agent interface
Provides executable simulation for LLM planning evaluation
🔎 Similar Papers
No similar papers found.
N
Niklas Jobs
Institute of Automation Technology, Helmut Schmidt University / University of the Federal Armed Forces Hamburg, Germany
L
Luis Miguel Vieira da Silva
Institute of Automation Technology, Helmut Schmidt University / University of the Federal Armed Forces Hamburg, Germany
J
Jayanth Somashekaraiah
Institute of Automation Technology, Helmut Schmidt University / University of the Federal Armed Forces Hamburg, Germany
M
Maximilian Weigand
Institute of Automation Technology, Helmut Schmidt University / University of the Federal Armed Forces Hamburg, Germany
D
David Kube
Siemens AG, Nuremberg, Germany
Felix Gehlhoff
Felix Gehlhoff
Institute of Automation Technology, Helmut Schmidt University
Agent-based systemsdecentralised scheduling