Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the susceptibility of general artificial intelligence systems to cognitive biases—such as anchoring, conformity, and framing effects—induced by prompt wording in software engineering decision-making, which often leads to suboptimal judgments. To mitigate this, the authors propose PROBE-SWE, a dynamic benchmark that formalizes software engineering best practices into axiomatic reasoning cues. The framework integrates dynamic pairwise prompting, chain-of-thought reasoning, and self-debiasing techniques directly into the prompt design to steer models away from bias-prone reasoning shortcuts. Experimental results demonstrate that this approach reduces overall bias sensitivity by an average of 51% (p < .001) and uncovers specific linguistic patterns that exacerbate bias, offering empirical grounding and practical warnings for deploying AI in high-stakes software engineering contexts.

Technology Category

Application Category

📝 Abstract

Prompt-induced cognitive biases are changes in a general-purpose AI (GPAI) system's decisions caused solely by biased wording in the input (e.g., framing, anchors), not task logic. In software engineering (SE) decision support (where problem statements and requirements are natural language) small phrasing shifts (e.g., popularity hints or outcome reveals) can push GPAI models toward suboptimal decisions. We study this with PROBE-SWE, a dynamic benchmark for SE that pairs biased and unbiased versions of the same SE dilemmas, controls for logic and difficulty, and targets eight SE-relevant biases (anchoring, availability, bandwagon, confirmation, framing, hindsight, hyperbolic discounting, overconfidence). We ask whether prompt engineering mitigates bias sensitivity in practice, focusing on actionable techniques that practitioners can apply off-the-shelf in real environments. Testing common strategies (e.g., chain-of-thought, self-debiasing) on cost-effective GPAI systems, we find no statistically significant reductions in bias sensitivity on a per-bias basis. We then adopt a Prolog-style view of the reasoning process: solving SE dilemmas requires making explicit any background axioms and inference assumptions (i.e., SE best practices) that are usually implicit in the prompt. So, we hypothesize that bias-inducing features short-circuit assumptions elicitation, pushing GPAI models toward biased shortcuts. Building on this, we introduce an end-to-end method that elicits best practices and injects axiomatic reasoning cues into the prompt before answering, reducing overall bias sensitivity by 51% on average (p < .001). Finally, we report a thematic analysis that surfaces linguistic patterns associated with heightened bias sensitivity, clarifying when GPAI use is less advisable for SE decision support and where to focus future countermeasures.

Problem

Research questions and friction points this paper is trying to address.

prompt-induced bias

cognitive bias

software engineering

AI decision support

natural language prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt-induced bias

axiomatic reasoning

software engineering decision support