Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Industrial automation demands flexible control policies capable of adapting to dynamic tasks and environments, yet current LLM-based agents lack standardized planning benchmarks. To address this, we introduce the first LLM agent planning and control benchmark tailored for industrial automation, built upon an executable Blocksworld simulation environment and encompassing five progressively complex task categories. We innovatively propose the Model Context Protocol (MCP) as a unified tool interface, enabling plug-and-play integration and fair evaluation across heterogeneous agent architectures. The benchmark features modular design, automated evaluation pipelines, and a comprehensive quantitative metric suite; its feasibility has been validated using single-agent architectures. As an open-source platform, it fills a critical gap in systematic, reproducible benchmarking for LLM agent planning—establishing a foundation for rigorous, scalable research in intelligent control.

Technology Category

Application Category

📝 Abstract

Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for such adaptive planning and execution but lack standardized benchmarks for systematic comparison. We introduce a benchmark with an executable simulation environment representing the Blocksworld problem providing five complexity categories. By integrating the Model Context Protocol (MCP) as a standardized tool interface, diverse agent architectures can be connected to and evaluated against the benchmark without implementation-specific modifications. A single-agent implementation demonstrates the benchmark's applicability, establishing quantitative metrics for comparison of LLM-based planning and execution approaches.

Problem

Research questions and friction points this paper is trying to address.

Standardizes benchmarks for LLM agents in adaptive planning

Introduces Blocksworld simulation with five complexity levels

Enables agent evaluation via Model Context Protocol interface

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Blocksworld benchmark with five complexity categories

Integrates Model Context Protocol for standardized agent interface

Provides executable simulation for LLM planning evaluation

🔎 Similar Papers

No similar papers found.