🤖 AI Summary
This work addresses the high cost and privacy risks associated with using closed-source large language models to generate SimPy-based queueing simulation code, as well as the poor performance of general-purpose open-source code models in this specialized domain. To overcome these limitations, we propose a three-stage progressive fine-tuning framework that integrates supervised fine-tuning (SFT) and direct preference optimization (DPO) to adapt Qwen-Coder-7B and DeepSeek-Coder-6.7B to the SimPy domain. We also construct the first SimPy-specific dataset and introduce an execution-based validation mechanism. The resulting models significantly outperform baseline approaches in code executability, syntactic correctness, and instruction adherence, achieving for the first time efficient and reliable open-source generation of SimPy simulation code, thereby offering a viable alternative for education, research, and decision support.
📝 Abstract
The Python package SimPy is widely used for modeling queueing systems due to its flexibility, simplicity, and smooth integration with modern data analysis and optimization frameworks. Recent advances in large language models (LLMs) have shown strong ability in generating clear and executable code, making them powerful and suitable tools for writing SimPy queueing simulation code. However, directly employing closed-source models like GPT-4o to generate such code may lead to high computational costs and raise data privacy concerns. To address this, we fine-tune two open-source LLMs, Qwen-Coder-7B and DeepSeek-Coder-6.7B, on curated SimPy queueing data, which enhances their code-generating performance in executability, output-format compliance, and instruction-code consistency. Particularly, we proposed a multi-stage fine-tuning framework comprising two stages of supervised fine-tuning (SFT) and one stage of direct preference optimization (DPO), progressively enhancing the model's ability in SimPy-based queueing simulation code generation. Extensive evaluations demonstrate that both fine-tuned models achieve substantial improvements in executability, output-format compliance, and instruct consistency. These results confirm that domain-specific fine-tuning can effectively transform compact open-source code models into reliable SimPy simulation generators which provide a practical alternative to closed-source LLMs for education, research, and operational decision support.