Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Prior work lacks systematic, cross-model evaluation of prompt engineering techniques for software engineering (SE) tasks. Method: This study conducts an empirical analysis of 14 prompt engineering techniques across 10 SE tasks—including code generation, bug fixing, and code question answering—spanning six paradigms: zero-shot, few-shot, chain-of-thought (CoT), ensemble, self-critique, and decomposition. Experiments are performed on LLaMA-3, CodeLlama, GPT-4, and Claude-3. Contribution/Results: We introduce the first multi-dimensional, task-aware prompt engineering benchmark framework tailored to SE. Our analysis reveals a strong correlation between task logical complexity and prompt strategy efficacy: CoT and decomposition improve accuracy by +18.7% on high-reasoning tasks, whereas few-shot excels on context-sensitive tasks. We propose a principled prompt selection guideline grounded in linguistic features and resource overhead (latency/token cost), and publicly release a reusable decision table and overhead evaluation toolkit.

Technology Category

Application Category

📝 Abstract

A growing variety of prompt engineering techniques has been proposed for Large Language Models (LLMs), yet systematic evaluation of each technique on individual software engineering (SE) tasks remains underexplored. In this study, we present a systematic evaluation of 14 established prompt techniques across 10 SE tasks using four LLM models. As identified in the prior literature, the selected prompting techniques span six core dimensions (Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition). They are evaluated on tasks such as code generation, bug fixing, and code-oriented question answering, to name a few. Our results show which prompting techniques are most effective for SE tasks requiring complex logic and intensive reasoning versus those that rely more on contextual understanding and example-driven scenarios. We also analyze correlations between the linguistic characteristics of prompts and the factors that contribute to the effectiveness of prompting techniques in enhancing performance on SE tasks. Additionally, we report the time and token consumption for each prompting technique when applied to a specific task and model, offering guidance for practitioners in selecting the optimal prompting technique for their use cases.

Problem

Research questions and friction points this paper is trying to address.

Evaluating prompting techniques for software engineering tasks

Identifying effective techniques for logic-intensive versus context-driven tasks

Analyzing linguistic characteristics and performance factors of prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation of 14 prompt techniques

Analysis across 10 software engineering tasks

Comparison of time and token consumption

🔎 Similar Papers

The Prompt Report: A Systematic Survey of Prompting Techniques