Automated Prompt Generation for Code Intelligence: An Empirical study and Experience in WeChat

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large code models (LCMs) suffer from poor prompt generalizability, heavy reliance on manual prompt engineering, and incompatibility with black-box models. Method: This paper proposes the first automated prompt generation framework integrating Instruction Generation (IG) and Multi-Step Reasoning (MSR), requiring no access to model internals. It systematically enhances prompt effectiveness via structured instruction construction and stepwise semantic refinement. Contribution/Results: Evaluated across multiple open-source LCMs and the industrial WeChat-Bench benchmark, our approach achieves average improvements of 28.38% in CodeBLEU, 58.11% in ROUGE-L, 84.53% in SuccessRate@1, and 148.89% in MRR on code translation, summarization, and API recommendation tasks. We further uncover, for the first time, the synergistic gain mechanism between IG and MSR in code intelligence, establishing a reusable, task-agnostic automation paradigm for prompt engineering of black-box LCMs.

Technology Category

Application Category

📝 Abstract

Large Code Models (LCMs) show potential in code intelligence, but their effectiveness is greatly influenced by prompt quality. Current prompt design is mostly manual, which is time-consuming and highly dependent on specific LCMs and tasks. While automated prompt generation (APG) exists in NLP, it is underexplored for code intelligence. This creates a gap, as automating the prompt process is essential for developers facing diverse tasks and black-box LCMs. To mitigate this, we empirically investigate two important parts of APG: Instruction Generation (IG) and Multi-Step Reasoning (MSR). IG provides a task-related description to instruct LCMs, while MSR guides them to produce logical steps before the final answer. We evaluate widely-used APG methods for each part on four open-source LCMs and three code intelligence tasks: code translation (PL-PL), code summarization (PL-NL), and API recommendation (NL-PL).Experimental results indicate that both IG and MSR dramatically enhance performance compared to basic prompts. Based on these results, we propose a novel APG approach combining the best methods of the two parts. Experiments show our approach achieves average improvements of 28.38% in CodeBLEU (code translation), 58.11% in ROUGE-L (code summarization), and 84.53% in SuccessRate@1 (API recommendation) over basic prompts. To validate its effectiveness in an industrial scenario, we evaluate our approach on WeChat-Bench, a proprietary dataset, achieving an average MRR improvement of 148.89% for API recommendation.

Problem

Research questions and friction points this paper is trying to address.

Automating prompt generation for code intelligence tasks

Addressing manual prompt design limitations in Large Code Models

Enhancing LCM performance through instruction generation and reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated prompt generation for code intelligence tasks

Combining instruction generation with multi-step reasoning

Empirically validated on multiple code models and datasets

🔎 Similar Papers

No similar papers found.

Authors to Follow