🤖 AI Summary
General-purpose large language model (LLM) agents often struggle to reliably execute complex scientific tasks such as colloidal packing Monte Carlo simulations, typically limited to descriptive rather than executable outputs. This work proposes an agent-based automation framework that integrates domain-specific Python packages (colpack, wrapping HOOMD-blue), a Model Context Protocol (MCP) tool server, a four-stage skill contract, and a multimodal LLM orchestration mechanism to construct an end-to-end executable workflow for colloidal packing research. The approach successfully demonstrates simulations of cubic particles, two-dimensional disk–capsule mixtures, and hard-disk freezing phase transitions. Rigorous benchmarking across 17 prompt stages with multiple LLMs confirms the framework’s execution reliability and generalizability.
📝 Abstract
We introduce ColPackAgent, an agent framework that autonomously runs Monte Carlo simulations of colloidal packing through a Model Context Protocol (MCP) tool server and an agent skill, whether as a standalone agent or inside an existing agent system. By harnessing the MCP server and agent skill, ColPackAgent executes a structured workflow for colloidal packing simulations, which are central to studies of phase behavior, self-assembly, and materials design. Without dedicated simulation tools and workflow instructions, general-purpose Large Language Model (LLM) agents tend to describe such workflows rather than execute them reliably. The MCP server exposes a custom-built colpack Python package that wraps HOOMD-blue hard-particle Monte Carlo, and the skill encodes a four-stage workflow contract. ColPackAgent can carry out the workflow interactively with human feedback, autonomously from an end-to-end prompt, or as autoresearch following a provided program file. We demonstrate the system in different modes with several colloidal packing simulation examples such as cube particles in 3D, a binary system of disks and capsules in 2D, and the 2D hard-disk freezing transition using autoresearch. We also compare model performance on this workflow across a panel of LLMs with 17 stage-specific prompts. This benchmark provides a stage-level check of how reliably different models follow the setup, planning, and analysis workflow. Together, these results show that pairing a domain Python package with MCP tools and a portable agent skill provides a practical route for turning a simulation toolkit into an agent-assisted research workflow.