Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work lacks systematic, cross-model evaluation of prompt engineering techniques for software engineering (SE) tasks. Method: This study conducts an empirical analysis of 14 prompt engineering techniques across 10 SE tasks—including code generation, bug fixing, and code question answering—spanning six paradigms: zero-shot, few-shot, chain-of-thought (CoT), ensemble, self-critique, and decomposition. Experiments are performed on LLaMA-3, CodeLlama, GPT-4, and Claude-3. Contribution/Results: We introduce the first multi-dimensional, task-aware prompt engineering benchmark framework tailored to SE. Our analysis reveals a strong correlation between task logical complexity and prompt strategy efficacy: CoT and decomposition improve accuracy by +18.7% on high-reasoning tasks, whereas few-shot excels on context-sensitive tasks. We propose a principled prompt selection guideline grounded in linguistic features and resource overhead (latency/token cost), and publicly release a reusable decision table and overhead evaluation toolkit.

Technology Category

Application Category

📝 Abstract
A growing variety of prompt engineering techniques has been proposed for Large Language Models (LLMs), yet systematic evaluation of each technique on individual software engineering (SE) tasks remains underexplored. In this study, we present a systematic evaluation of 14 established prompt techniques across 10 SE tasks using four LLM models. As identified in the prior literature, the selected prompting techniques span six core dimensions (Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition). They are evaluated on tasks such as code generation, bug fixing, and code-oriented question answering, to name a few. Our results show which prompting techniques are most effective for SE tasks requiring complex logic and intensive reasoning versus those that rely more on contextual understanding and example-driven scenarios. We also analyze correlations between the linguistic characteristics of prompts and the factors that contribute to the effectiveness of prompting techniques in enhancing performance on SE tasks. Additionally, we report the time and token consumption for each prompting technique when applied to a specific task and model, offering guidance for practitioners in selecting the optimal prompting technique for their use cases.
Problem

Research questions and friction points this paper is trying to address.

Evaluating prompting techniques for software engineering tasks
Identifying effective techniques for logic-intensive versus context-driven tasks
Analyzing linguistic characteristics and performance factors of prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation of 14 prompt techniques
Analysis across 10 software engineering tasks
Comparison of time and token consumption
🔎 Similar Papers
E
Enio Garcia
Federal University of Bahia (UFBA), Brazil
S
Santana Junior
Federal University of Bahia (UFBA), Brazil
G
Gabriel Benjamin
Federal University of Bahia (UFBA), Brazil
M
Melissa Araujo
Federal University of Bahia (UFBA), Brazil
H
Harrison Santos
Federal University of Bahia (UFBA), Brazil
D
David Freitas
Federal University of Bahia (UFBA), Brazil
Eduardo Almeida
Eduardo Almeida
Professor, Institute of Computing, Federal University of Bahia (IC-UFBA)
Engenharia de SoftwareSoftware EngineeringSE4AIAI4SESoftware Reuse
P
Paulo Anselmo da Mota
Federal Rural University of Pernambuco (UFRPE), Brazil
S
Silveira Neto
Federal Rural University of Pernambuco (UFRPE), Brazil
J
Jiawei Li
University of California, Irvine (UCI), USA
J
Jina Chun
University of California, Irvine (UCI), USA
Iftekhar Ahmed
Iftekhar Ahmed
Associate Professor, University of California, Irvine
Software EngineeringSoftware TestingMachine Learning