On the Evaluation of Engineering Artificial General Intelligence

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Engineering Artificial General Intelligence (eAGI) lacks systematic, scalable evaluation methodologies for physical system and controller design. Method: This paper introduces the first domain-specific, extensible eAGI evaluation framework for engineering. It uniquely adapts Bloom’s Taxonomy to the cognitive hierarchy of engineering design; integrates engineering knowledge graphs with structured CAD/SysML model parsing to construct a multi-granularity question bank; and establishes a hierarchical capability assessment体系 supporting quantitative evaluation across four dimensions: knowledge retrieval, tool operation, component comprehension, and cross-domain innovation. Contribution: The framework enables automated, customizable evaluation workflows and seamless cross-domain transferability. It provides the first benchmark generation methodology spanning methodological cognition to real-world engineering problems. Empirical results demonstrate significantly enhanced measurability and comparability of eAGI systems in complex engineering design tasks.

Technology Category

Application Category

📝 Abstract

We discuss the challenges and propose a framework for evaluating engineering artificial general intelligence (eAGI) agents. We consider eAGI as a specialization of artificial general intelligence (AGI), deemed capable of addressing a broad range of problems in the engineering of physical systems and associated controllers. We exclude software engineering for a tractable scoping of eAGI and expect dedicated software engineering AI agents to address the software implementation challenges. Similar to human engineers, eAGI agents should possess a unique blend of background knowledge (recall and retrieve) of facts and methods, demonstrate familiarity with tools and processes, exhibit deep understanding of industrial components and well-known design families, and be able to engage in creative problem solving (analyze and synthesize), transferring ideas acquired in one context to another. Given this broad mandate, evaluating and qualifying the performance of eAGI agents is a challenge in itself and, arguably, a critical enabler to developing eAGI agents. In this paper, we address this challenge by proposing an extensible evaluation framework that specializes and grounds Bloom's taxonomy - a framework for evaluating human learning that has also been recently used for evaluating LLMs - in an engineering design context. Our proposed framework advances the state of the art in benchmarking and evaluation of AI agents in terms of the following: (a) developing a rich taxonomy of evaluation questions spanning from methodological knowledge to real-world design problems; (b) motivating a pluggable evaluation framework that can evaluate not only textual responses but also evaluate structured design artifacts such as CAD models and SysML models; and (c) outlining an automatable procedure to customize the evaluation benchmark to different engineering contexts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating engineering artificial general intelligence (eAGI) agents' performance

Developing a framework for assessing eAGI in engineering design contexts

Creating an extensible benchmark for eAGI's knowledge and problem-solving abilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extensible evaluation framework based on Bloom's taxonomy

Pluggable framework for textual and structured design evaluation

Automatable benchmark customization for engineering contexts

🔎 Similar Papers

Position: Levels of AGI for Operationalizing Progress on the Path to AGI