๐ค AI Summary
Current large language models (LLMs) struggle to generate complete, structurally complex, and domain-intensive deep learning (DL) project code, primarily due to insufficient project-level coherent planning and lack of domain-specific expertise. To address this, we propose DLCodeGenโthe first planning-guided code generation framework specifically designed for end-to-end DL projects. It integrates a four-stage collaborative mechanism: (1) structured solution prediction, (2) semantic code retrieval, (3) template abstraction, and (4) contrastive learning-enhanced retrieval-augmented generation (RAG). By explicitly modeling project architecture and domain semantics, DLCodeGen significantly improves the completeness and correctness of generated code. Evaluated on a novel, manually curated DL project dataset, DLCodeGen achieves a 9.7% absolute gain in CodeBLEU and a 3.6-point improvement in human evaluation scores, consistently outperforming all existing baseline methods across multiple metrics.
๐ Abstract
While large language models (LLMs) have been widely applied to code generation, they struggle with generating entire deep learning projects, which are characterized by complex structures, longer functions, and stronger reliance on domain knowledge than general-purpose code. An open-domain LLM often lacks coherent contextual guidance and domain expertise for specific projects, making it challenging to produce complete code that fully meets user requirements. In this paper, we propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects. DLCodeGen predicts a structured solution plan, offering global guidance for LLMs to generate the project. The generated plan is then leveraged to retrieve semantically analogous code samples and subsequently abstract a code template. To effectively integrate these multiple retrieval-augmented techniques, a comparative learning mechanism is designed to generate the final code. We validate the effectiveness of our approach on a dataset we build for deep learning code generation. Experimental results demonstrate that DLCodeGen outperforms other baselines, achieving improvements of 9.7% in CodeBLEU and 3.6% in human evaluation metrics.