🤖 AI Summary
Existing LLM evaluation benchmarks lack systematic assessment of physical feasibility reasoning—particularly the fidelity of translating natural-language specifications into constructible structures—in engineering construction automation. To address this gap, we introduce PhysBuild, the first physics-aligned, interactive benchmark platform for language-driven construction. It integrates Newtonian mechanics, static/dynamic constraints, and 3D geometric computation to formulate a multi-level, scalable task suite. We design an agent-based automated evaluation pipeline unifying NLP understanding, parametric geometric modeling, and physics-based simulation. Empirical evaluation across eight state-of-the-art LLMs quantifies critical capability gaps in structural stability, force equilibrium, and spatial feasibility—revealing previously unmeasured performance bottlenecks. Our work establishes a reproducible, extensible evaluation paradigm and open benchmark infrastructure to advance AI-enabled intelligent construction.
📝 Abstract
Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we introduce BuildArena, the first physics-aligned interactive benchmark designed for language-driven engineering construction. It contributes to the community in four aspects: (1) a highly customizable benchmarking framework for in-depth comparison and analysis of LLMs; (2) an extendable task design strategy spanning static and dynamic mechanics across multiple difficulty tiers; (3) a 3D Spatial Geometric Computation Library for supporting construction based on language instructions; (4) a baseline LLM agentic workflow that effectively evaluates diverse model capabilities. On eight frontier LLMs, BuildArena comprehensively evaluates their capabilities for language-driven and physics-grounded construction automation. The project page is at https://build-arena.github.io/.