🤖 AI Summary
Existing LLM benchmarks inadequately assess engineering design capabilities—particularly the integration of cross-disciplinary knowledge, handling of multiple constraints, and generation of goal-directed solutions—due to their static, domain-limited focus (e.g., language understanding or code generation).
Method: We introduce EngDesign, the first benchmark explicitly designed for multidisciplinary engineering design. It spans nine engineering domains and pioneers a simulation-driven, dynamic functional verification paradigm that jointly incorporates domain-specific knowledge modeling, constraint solving, and performance testing to enable closed-loop evaluation of design proposals.
Contribution/Results: EngDesign overcomes the limitations of static answer-based assessment, significantly improving the fidelity and practicality of evaluating LLMs on complex trade-off analysis, process orchestration, and real-world adaptability. It provides a reproducible, extensible foundation for rigorously assessing and advancing LLMs’ deployment in engineering applications.
📝 Abstract
Modern engineering, spanning electrical, mechanical, aerospace, civil, and computer disciplines, stands as a cornerstone of human civilization and the foundation of our society. However, engineering design poses a fundamentally different challenge for large language models (LLMs) compared with traditional textbook-style problem solving or factual question answering. Although existing benchmarks have driven progress in areas such as language understanding, code synthesis, and scientific problem solving, real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers'time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce EngDesign, an Engineering Design benchmark that evaluates LLMs'abilities to perform practical design tasks across nine engineering domains. Unlike existing benchmarks that focus on factual recall or question answering, EngDesign uniquely emphasizes LLMs'ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented engineering designs. Each task in EngDesign represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. EngDesign pioneers a simulation-based evaluation paradigm that moves beyond textbook knowledge to assess genuine engineering design capabilities and shifts evaluation from static answer checking to dynamic, simulation-driven functional verification, marking a crucial step toward realizing the vision of engineering Artificial General Intelligence (AGI).