AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the absence of a multimodal evaluation benchmark tailored for autonomous agents in the Architecture, Engineering, and Construction (AEC) domain, which has hindered the assessment of their capabilities in real-world tasks such as drawing comprehension, cross-drawing reasoning, and project-level collaboration. To bridge this gap, we introduce AEC-Bench—the first comprehensive multimodal benchmark specifically designed for AEC agents—featuring standardized datasets, evaluation protocols, and baseline systems that encompass these core challenges. Integrating multimodal processing, agent architectures, and cross-document reasoning techniques, AEC-Bench leverages large language models such as Claude and Codex as baselines to validate the efficacy of general-purpose tools and agent design strategies. The complete benchmark suite is open-sourced, significantly advancing the evaluability and reproducibility of agent-based research in the AEC field.

Technology Category

Application Category

📝 Abstract

The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at https://github.com/nomic-ai/aec-bench under an Apache 2 license.

Problem

Research questions and friction points this paper is trying to address.

AEC

agentic systems

multimodal benchmark

drawing understanding

cross-sheet reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark

agentic systems

AEC domain