Agentic Large Language Models for Conceptual Systems Engineering and Design

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle with task continuity and generating executable engineering models during early-stage design. Method: This paper proposes a structured multi-agent system (MAS) instantiated on a solar-powered water filtration system, automating requirement extraction, functional decomposition, and simulation code generation. It introduces a Design State Graph (DSG) as a structured knowledge representation and a fine-grained nine-role MAS architecture—replacing conventional two-agent approaches—leveraging Llama 3.3 70B and DeepSeek R1 70B, JSON-serialized graph structures, and coordinated agent protocols for physics-informed modeling and code synthesis. Contribution/Results: The MAS produces more granular design graphs (avg. 5–6 nodes), achieves 100% code compatibility, and significantly improves workflow completion rate via reasoning distillation. However, requirement coverage remains low (<20%), exposing a semantic alignment bottleneck between natural language requirements and formal design representations.

Technology Category

Application Category

📝 Abstract
Early-stage engineering design involves complex, iterative reasoning, yet existing large language model (LLM) workflows struggle to maintain task continuity and generate executable models. We evaluate whether a structured multi-agent system (MAS) can more effectively manage requirements extraction, functional decomposition, and simulator code generation than a simpler two-agent system (2AS). The target application is a solar-powered water filtration system as described in a cahier des charges. We introduce the Design-State Graph (DSG), a JSON-serializable representation that bundles requirements, physical embodiments, and Python-based physics models into graph nodes. A nine-role MAS iteratively builds and refines the DSG, while the 2AS collapses the process to a Generator-Reflector loop. Both systems run a total of 60 experiments (2 LLMs - Llama 3.3 70B vs reasoning-distilled DeepSeek R1 70B x 2 agent configurations x 3 temperatures x 5 seeds). We report a JSON validity, requirement coverage, embodiment presence, code compatibility, workflow completion, runtime, and graph size. Across all runs, both MAS and 2AS maintained perfect JSON integrity and embodiment tagging. Requirement coverage remained minimal (less than 20%). Code compatibility peaked at 100% under specific 2AS settings but averaged below 50% for MAS. Only the reasoning-distilled model reliably flagged workflow completion. Powered by DeepSeek R1 70B, the MAS generated more granular DSGs (average 5-6 nodes) whereas 2AS mode-collapsed. Structured multi-agent orchestration enhanced design detail. Reasoning-distilled LLM improved completion rates, yet low requirements and fidelity gaps in coding persisted.
Problem

Research questions and friction points this paper is trying to address.

Improving task continuity in early-stage engineering design using LLMs
Comparing multi-agent vs two-agent systems for requirements extraction and code generation
Evaluating JSON-serializable Design-State Graph for system design representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured multi-agent system manages design tasks
Design-State Graph bundles requirements and models
Reasoning-distilled LLM improves workflow completion
🔎 Similar Papers
No similar papers found.
S
Soheyl Massoudi
IDEAL, Chair of Artificial Intelligence in Engineering Design, ETH Zurich, Zurich, Switzerland
Mark Fuge
Mark Fuge
ETH Zurich
Product DesignMachine LearningStatisticsDesign CreativityComputational Design