🤖 AI Summary
Preparing input files for protein molecular dynamics (MD) simulations is time-consuming and error-prone. This paper introduces the first end-to-end solution integrating a large language model (Gemini 2.0 Flash) with web automation, implemented via Python and Selenium to intelligently invoke and iteratively optimize the CHARMM Graphical User Interface (GUI), thereby generating high-fidelity NAMD input files automatically. Our key contributions are: (i) the first LLM-driven, GUI-based interactive parameter generation and self-correcting workflow requiring zero manual intervention; and (ii) an integrated post-processing module enabling parallel configuration of multi-protein systems. Experimental evaluation demonstrates over 80% reduction in simulation setup time, substantial mitigation of human-induced errors, and high robustness and scalability. The proposed framework delivers an efficient, fully automated, and standardized workflow for computational biophysics.
📝 Abstract
Molecular dynamics simulations are an essential tool in understanding protein structure, dynamics, and function at the atomic level. However, preparing high quality input files for MD simulations can be a time consuming and error prone process. In this work, we introduce an automated pipeline that leverages Large Language Models (LLMs), specifically Gemini 2.0 Flash, in conjunction with python scripting and Selenium based web automation to streamline the generation of MD input files. The pipeline exploits CHARMM GUI's comprehensive web-based interface for preparing simulation-ready inputs for NAMD. By integrating Gemini's code generation and iterative refinement capabilities, simulation scripts are automatically written, executed, and revised to navigate CHARMM GUI, extract appropriate parameters, and produce the required NAMD input files. Post processing is performed using additional software to further refine the simulation outputs, thereby enabling a complete and largely hands free workflow. Our results demonstrate that this approach reduces setup time, minimizes manual errors, and offers a scalable solution for handling multiple protein systems in parallel. This automated framework paves the way for broader application of LLMs in computational structural biology, offering a robust and adaptable platform for future developments in simulation automation.