Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This study addresses the lack of actionable guidelines for crafting effective prompts in code generation tasks, which hinders developers’ ability to optimize large language model outputs. To bridge this gap, the authors propose a test-driven approach to automatic prompt optimization, integrating qualitative content analysis with user studies through iterative experimentation to identify key elements that enhance prompt efficacy. Based on empirical findings, they formulate ten structured guidelines specifically tailored for code generation prompts. These guidelines were validated through evaluations involving 50 developers, demonstrating both their effectiveness and practical utility. The research further uncovers a notable discrepancy between developers’ actual usage patterns and their perceived usefulness of prompt strategies, offering actionable insights for practitioners, educators, and tool designers in the software development ecosystem.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are nowadays extensively used for various types of software engineering tasks, primarily code generation. Previous research has shown how suitable prompt engineering could help developers in improving their code generation prompts. However, so far, there do not exist specific guidelines driving developers towards writing suitable prompts for code generation. In this work, we derive and evaluate development-specific prompt optimization guidelines. First, we use an iterative, test-driven approach to automatically refine code generation prompts, and we analyze the outcome of this process to identify prompt improvement items that lead to test passes. We use such elements to elicit 10 guidelines for prompt improvement, related to better specifying I/O, pre-post conditions, providing examples, various types of details, or clarifying ambiguities. We conduct an assessment with 50 practitioners, who report their usage of the elicited prompt improvement patterns, as well as their perceived usefulness, which does not always correspond to the actual usage before knowing our guidelines. Our results lead to implications not only for practitioners and educators, but also for those aimed at creating better LLM-aided software development tools.

Problem

Research questions and friction points this paper is trying to address.

prompt engineering

code generation

large language models

software engineering

developer guidelines

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt engineering

code generation

large language models