Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs

📅 2025-07-01

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing LLM benchmarks inadequately assess engineering design capabilities—particularly the integration of cross-disciplinary knowledge, handling of multiple constraints, and generation of goal-directed solutions—due to their static, domain-limited focus (e.g., language understanding or code generation). Method: We introduce EngDesign, the first benchmark explicitly designed for multidisciplinary engineering design. It spans nine engineering domains and pioneers a simulation-driven, dynamic functional verification paradigm that jointly incorporates domain-specific knowledge modeling, constraint solving, and performance testing to enable closed-loop evaluation of design proposals. Contribution/Results: EngDesign overcomes the limitations of static answer-based assessment, significantly improving the fidelity and practicality of evaluating LLMs on complex trade-off analysis, process orchestration, and real-world adaptability. It provides a reproducible, extensible foundation for rigorously assessing and advancing LLMs’ deployment in engineering applications.

Technology Category

Application Category

📝 Abstract

Modern engineering, spanning electrical, mechanical, aerospace, civil, and computer disciplines, stands as a cornerstone of human civilization and the foundation of our society. However, engineering design poses a fundamentally different challenge for large language models (LLMs) compared with traditional textbook-style problem solving or factual question answering. Although existing benchmarks have driven progress in areas such as language understanding, code synthesis, and scientific problem solving, real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers'time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce EngDesign, an Engineering Design benchmark that evaluates LLMs'abilities to perform practical design tasks across nine engineering domains. Unlike existing benchmarks that focus on factual recall or question answering, EngDesign uniquely emphasizes LLMs'ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented engineering designs. Each task in EngDesign represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. EngDesign pioneers a simulation-based evaluation paradigm that moves beyond textbook knowledge to assess genuine engineering design capabilities and shifts evaluation from static answer checking to dynamic, simulation-driven functional verification, marking a crucial step toward realizing the vision of engineering Artificial General Intelligence (AGI).

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to perform practical engineering design tasks across domains

Assessing synthesis of domain knowledge and reasoning under design constraints

Moving beyond factual recall to functional engineering design verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates engineering design across domains

Simulation-based paradigm verifies functional engineering capabilities

Emphasizes synthesis of domain knowledge and constraints

🔎 Similar Papers

No similar papers found.