MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high expertise and cost barriers in writing LAMMPS scripts for molecular dynamics (MD) simulations, a task where existing large language models (LLMs) underperform due to scarce domain-specific data and poor code executability. We propose MDAgent2, the first end-to-end LLM framework tailored for MD, which integrates a domain-specific data pipeline, a three-stage post-training strategy—comprising continued pretraining, supervised fine-tuning, and closed-loop reinforcement learning via simulation feedback (MD-GRPO)—and a multi-agent self-correcting runtime system, MDAgent2-RUNTIME. The resulting MD-Instruct and MD-Code models significantly outperform strong baselines on our newly curated evaluation benchmark, MD-EvalBench, demonstrating the adaptability and generalization capability of large models in industrial-scale scientific simulations.

Technology Category

Application Category

📝 Abstract
Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2
Problem

Research questions and friction points this paper is trying to address.

Molecular Dynamics
Code Generation
Large Language Models
LAMMPS
Domain-specific QA
Innovation

Methods, ideas, or system contributions that make the work stand out.

MDAgent2
domain-adapted LLM
MD-GRPO
LAMMPS code generation
MD-EvalBench
Z
Zhuofan Shi
Peking University
H
Hubao A
The Hong Kong University of Science and Technology
Y
Yufei Shao
Liaoning Technical University
D
Dongliang Huang
Peking University
H
Hongxu An
Peking University
C
Chunxiao Xin
Peking University
H
Haiyang Shen
Peking University
Zhenyu Wang
Zhenyu Wang
北京大学电子信息硕士
人工智能
Y
Yunshan Na
Wenjing Future Lab (Beijing) Technology Co., Ltd.
Gang Huang
Gang Huang
peking university
System software for Internet Computing
X
Xiang Jing
Peking University