Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large language models (LLMs) exhibit hallucination and low modeling accuracy when interfacing with closed-source simulation software (e.g., multibody dynamics tools), primarily due to their lack of access to proprietary domain knowledge. Method: This paper proposes and implements the first retrieval-augmented generation (RAG) framework tailored for closed-source multibody simulation software. It constructs a structured, private knowledge base to dynamically align users’ natural-language instructions with software-specific concepts, APIs, and modeling paradigms. Contribution/Results: The framework significantly enhances LLMs’ comprehension depth and generation reliability in knowledge-intensive tasks such as model creation. Experiments demonstrate a 42.6% improvement in accuracy across critical modeling steps, validating the approach’s feasibility and effectiveness. Analysis further identifies incompleteness of knowledge coverage and suboptimal retrieval precision as primary performance bottlenecks. This work establishes a reusable technical pathway and empirical foundation for integrating LLMs with industrial-grade closed-source simulation tools.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly helpful in text generation, even writing code in programming languages based on user prompts written in natural language. They are even applied to generate simulation models for multibody systems from natural language. Research results suggest that LLMs surpass the mere replication of existing code examples, where some LLMs have been trained on an open-source multibody simulation code. However, for closed-source simulation software, such results are not to be expected as their ideas and concepts might differ from other publicly available ones. LLMs can hallucinate for knowledge-intensive tasks, such as model creation, which can lead to wrong responses. This is especially the case for the LLM unknown closed-source simulation software. The same applies to other internal knowledge kept private to protect intellectual property or data privacy. The Retrieval-Augmented Generation (RAG) approach might yield a solution for these knowledge-intensive tasks. This paper explores the application of RAG to closed-source simulation software and presents first experiments. After a brief introduction to LLMs, the RAG approach, and the simulation method applied by the close-source simulation software, several examples are provided to test LLMs' knowledge of the simulation software and the creation of simulation models using two RAG systems. The examples show promising results indicating the benefits of applying RAG systems to closed-source simulation software, helping to access their knowledge. Nevertheless, they also reveal gaps in the applied information and open questions for further research.

Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs for closed-source simulation software

Address hallucination in LLM model creation

Apply RAG to improve knowledge-intensive tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation for simulations

LLMs access closed-source software knowledge

RAG reduces LLM hallucination in models

🔎 Similar Papers

Retrieval-augmented code completion for local projects using large language models