Evaluating LLM-generated code for domain-specific languages: molecular dynamics with LAMMPS

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the lack of effective evaluation methods for assessing the scientific validity of domain-specific language (DSL) code—such as LAMMPS molecular dynamics input scripts—generated by large language models (LLMs). To tackle this challenge, the authors propose a lightweight validation framework that combines input file normalization, an extensible DSL parser, and static syntactic and semantic checks. This approach enables domain experts to efficiently verify LLM-generated outputs without requiring deep expertise in the target DSL. By circumventing costly runtime execution, the framework facilitates systematic benchmarking of mainstream LLMs on scientific DSL generation tasks, revealing their current limitations. The study thus provides a practical pathway toward the safe integration of LLMs into specialized scientific computing workflows.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are changing the way researchers interact with code and data in scientific computing. While their ability to generate general-purpose code is well established, their effectiveness in producing scientifically valid code/input scripting for domain-specific languages (DSLs) remains largely unexplored. We propose an evaluation procedure that enables domain experts (who may not be experts in the DSL) to assess the validity of LLM-generated input files for LAMMPS, a widely used molecular dynamics (MD) code, and to use those assessments to evaluate the performance of state-of-the-art LLMs and identify common issues. Key to the evaluation procedure are a normalization step to generate canonical files and an extensible parser for syntax analysis. The following steps isolate common errors without incurring costly tests (in time and computational resources). Once a working input file is generated, LLMs can accelerate verification tests. Our findings highlight limitations of LLMs in generating scientific DSLs and a practical path forward for their integration into domain-specific computational ecosystems by domain experts.

Problem

Research questions and friction points this paper is trying to address.

large language models

domain-specific languages

molecular dynamics

LAMMPS

scientific code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific languages

large language models

LAMMPS