Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs

📅 2025-07-01
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM benchmarks inadequately assess engineering design capabilities—particularly the integration of cross-disciplinary knowledge, handling of multiple constraints, and generation of goal-directed solutions—due to their static, domain-limited focus (e.g., language understanding or code generation). Method: We introduce EngDesign, the first benchmark explicitly designed for multidisciplinary engineering design. It spans nine engineering domains and pioneers a simulation-driven, dynamic functional verification paradigm that jointly incorporates domain-specific knowledge modeling, constraint solving, and performance testing to enable closed-loop evaluation of design proposals. Contribution/Results: EngDesign overcomes the limitations of static answer-based assessment, significantly improving the fidelity and practicality of evaluating LLMs on complex trade-off analysis, process orchestration, and real-world adaptability. It provides a reproducible, extensible foundation for rigorously assessing and advancing LLMs’ deployment in engineering applications.

Technology Category

Application Category

📝 Abstract
Modern engineering, spanning electrical, mechanical, aerospace, civil, and computer disciplines, stands as a cornerstone of human civilization and the foundation of our society. However, engineering design poses a fundamentally different challenge for large language models (LLMs) compared with traditional textbook-style problem solving or factual question answering. Although existing benchmarks have driven progress in areas such as language understanding, code synthesis, and scientific problem solving, real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers'time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce EngDesign, an Engineering Design benchmark that evaluates LLMs'abilities to perform practical design tasks across nine engineering domains. Unlike existing benchmarks that focus on factual recall or question answering, EngDesign uniquely emphasizes LLMs'ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented engineering designs. Each task in EngDesign represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. EngDesign pioneers a simulation-based evaluation paradigm that moves beyond textbook knowledge to assess genuine engineering design capabilities and shifts evaluation from static answer checking to dynamic, simulation-driven functional verification, marking a crucial step toward realizing the vision of engineering Artificial General Intelligence (AGI).
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to perform practical engineering design tasks across domains
Assessing synthesis of domain knowledge and reasoning under design constraints
Moving beyond factual recall to functional engineering design verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates engineering design across domains
Simulation-based paradigm verifies functional engineering capabilities
Emphasizes synthesis of domain knowledge and constraints
🔎 Similar Papers
No similar papers found.
X
Xing-Gang Guo
University of Illinois at Urbana-Champaign
Yaxin Li
Yaxin Li
University of Illinois at Urbana-Champaign
X
Xiangyi Kong
University of Illinois at Urbana-Champaign
Y
Yilan Jiang
University of Illinois at Urbana-Champaign
X
Xiayu Zhao
University of Illinois at Urbana-Champaign
Z
Zhihua Gong
University of Illinois at Urbana-Champaign
Yufan Zhang
Yufan Zhang
George Mason University
Computer Vision
D
Daixuan Li
University of Illinois at Urbana-Champaign
T
Tianle Sang
University of Illinois at Urbana-Champaign
B
Beixiao Zhu
University of Illinois at Urbana-Champaign
G
Gregory Jun
University of Illinois at Urbana-Champaign
Yingbing Huang
Yingbing Huang
University of Illinois at Urbana-Champaign
Y
Yiqi Liu
University of Illinois at Urbana-Champaign
Y
Yu Xue
University of Illinois at Urbana-Champaign
Rahul Dev Kundu
Rahul Dev Kundu
University of Illinois at Urbana-Champaign
Q
Qi Jian Lim
University of Illinois at Urbana-Champaign
Y
Yizhou Zhao
University of Pennsylvania
L
Luke Alexander Granger
University of Illinois at Urbana-Champaign
M
M. B. Younis
University of Illinois at Urbana-Champaign
Darioush Keivan
Darioush Keivan
University of Illinois at Urbana-Champaign (UIUC)
Machine LearningOptimizationControl Theory
N
Nippun Sabharwal
University of Illinois at Urbana-Champaign
S
Shreyanka Sinha
University of Illinois at Urbana-Champaign
P
Prakhar Agarwal
University of Illinois at Urbana-Champaign
K
Kojo E. Vandyck
University of Illinois at Urbana-Champaign
H
Hanlin Mai
University of Illinois at Urbana-Champaign
Z
Zichen Wang
University of Illinois at Urbana-Champaign
A
Aditya Venkatesh
University of Illinois at Urbana-Champaign
A
Ayush Barik
University of Illinois at Urbana-Champaign
J
Jiankun Yang
University of Illinois at Urbana-Champaign
C
Chongying Yue
University of Illinois at Urbana-Champaign
J
Jin-Can He
University of Illinois at Urbana-Champaign
L
Libin Wang
University of Illinois at Urbana-Champaign
L
Licheng Xu
University of Illinois at Urbana-Champaign
H
Hao Chen
University of Illinois at Urbana-Champaign
Jinwen Wang
Jinwen Wang
University of Illinois at Urbana-Champaign
L
Liujun Xu
University of Illinois at Urbana-Champaign
R
Rushabh Shetty
University of Illinois at Urbana-Champaign
Z
Zi-Qi Guo
University of Illinois at Urbana-Champaign
D
Dahui Song
University of Illinois at Urbana-Champaign
M
Manvi Jha
University of Illinois at Urbana-Champaign
W
Weijie Liang
University of Illinois at Urbana-Champaign
W
Weiman Yan
University of Illinois at Urbana-Champaign
B
Bryan Zhang
University of Illinois at Urbana-Champaign
S
Sahil Bhandary Karnoor
University of Illinois at Urbana-Champaign
J
Jialiang Zhang
University of Illinois at Urbana-Champaign
R
Rutva Pandya
University of Illinois at Urbana-Champaign
Xinyi Gong
Xinyi Gong
CGG
Spherical IndentationAdditive ManufacturingHigh Throughput ExperimentationMaterials CharacterizationMaterials Informatic
M
Mithesh Ballae Ganesh
University of Illinois at Urbana-Champaign
F
Feize Shi
University of Illinois at Urbana-Champaign
R
Ruiling Xu
University of Illinois at Urbana-Champaign
Y
Yifan Zhang
University of Illinois at Urbana-Champaign
Yanfeng Ouyang
Yanfeng Ouyang
University of Illinois
L
Li-feng Qin
University of California San Diego
E
Elyse Rosenbaum
University of Illinois at Urbana-Champaign
C
Corey Snyder
University of California San Diego
P
Peter J. Seiler
University of Michigan
G
G. Dullerud
University of Illinois at Urbana-Champaign
Xiaojia Shelly Zhang
Xiaojia Shelly Zhang
University of Illinois at Urbana Champaign
Topology optimizationSoft active materialsMetamaterialInverse problemsComputational
Z
Zuofu Cheng
University of Illinois at Urbana-Champaign
P
P. Hanumolu
University of Illinois at Urbana-Champaign
J
Jian Huang
University of Illinois at Urbana-Champaign
Mayank Kulkarni
Mayank Kulkarni
Amazon, AGI
Machine LearningNatural Language ProcessingArtificial Intelligence
M
Mahdi Namazifar
Amazon AGI
H
Huan Zhang
University of Illinois at Urbana-Champaign
B
Bin Hu
University of Illinois at Urbana-Champaign