🤖 AI Summary
General-purpose large language models (LLMs) lack domain-specific expertise in composite materials processing and manufacturing equipment operation. Method: We developed two vertical-domain LLM systems integrating GPT-4 architecture with industry-specific knowledge bases, employing domain-adaptive fine-tuning and retrieval-augmented generation (RAG). Automated evaluation used ROUGE and BERTScore; human evaluation involved domain experts. Contribution/Results: Our models match or exceed GPT-4o on automated metrics; expert feedback confirms significantly improved answer specificity, depth, and technical query responsiveness. This work presents the first structured modeling of end-to-end operational knowledge in composite manufacturing and its specialized LLM deployment—establishing a reusable technical pathway and empirical benchmark for industrial AI adoption.
📝 Abstract
Engineering educational curriculum and standards cover many material and manufacturing options. However, engineers and designers are often unfamiliar with certain composite materials or manufacturing techniques. Large language models (LLMs) could potentially bridge the gap. Their capacity to store and retrieve data from large databases provides them with a breadth of knowledge across disciplines. However, their generalized knowledge base can lack targeted, industry-specific knowledge. To this end, we present two LLM-based applications based on the GPT-4 architecture: (1) The Composites Guide: a system that provides expert knowledge on composites material and connects users with research and industry professionals who can provide additional support and (2) The Equipment Assistant: a system that provides guidance for manufacturing tool operation and material characterization. By combining the knowledge of general AI models with industry-specific knowledge, both applications are intended to provide more meaningful information for engineers. In this paper, we discuss the development of the applications and evaluate it through a benchmark and two informal user studies. The benchmark analysis uses the Rouge and Bertscore metrics to evaluate our model performance against GPT-4o. The results show that GPT-4o and the proposed models perform similarly or better on the ROUGE and BERTScore metrics. The two user studies supplement this quantitative evaluation by asking experts to provide qualitative and open-ended feedback about our model performance on a set of domain-specific questions. The results of both studies highlight a potential for more detailed and specific responses with the Composites Guide and the Equipment Assistant.