A Taxonomy of Prompt Defects in LLM Systems

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the empirical and often unreliable nature of large language model (LLM) prompt design, which frequently leads to unsafe or erroneous behaviors. We propose the first systematic taxonomy of prompt defects tailored to software engineering, grounded in empirical analysis and root-cause modeling. The taxonomy spans six dimensions: specification intent, input content, structural format, contextual memory, performance efficiency, and maintainability—establishing explicit mappings among defect types, their impacts, and corresponding mitigation strategies. Building upon this foundation, we integrate prompt engineering patterns, automated safeguards, testing frameworks, and evaluation tools into an end-to-end defect mitigation system. Our contribution is the first comprehensive knowledge framework for ensuring LLM prompt reliability, enabling rigorous, engineering-driven design and verification of trustworthy LLM-based systems.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have become key components of modern software, with prompts acting as their de-facto programming interface. However, prompt design remains largely empirical and small mistakes can cascade into unreliable, insecure, or inefficient behavior. This paper presents the first systematic survey and taxonomy of prompt defects, recurring ways that prompts fail to elicit their intended behavior from LLMs. We organize defects along six dimensions: (1) Specification and Intent, (2) Input and Content, (3) Structure and Formatting, (4) Context and Memory, (5) Performance and Efficiency, and (6) Maintainability and Engineering. Each dimension is refined into fine-grained subtypes, illustrated with concrete examples and root cause analysis. Grounded in software engineering principles, we show how these defects surface in real development workflows and examine their downstream effects. For every subtype, we distill mitigation strategies that span emerging prompt engineering patterns, automated guardrails, testing harnesses, and evaluation frameworks. We then summarize these strategies in a master taxonomy that links defect, impact, and remedy. We conclude with open research challenges and a call for rigorous engineering-oriented methodologies to ensure that LLM-driven systems are dependable by design.
Problem

Research questions and friction points this paper is trying to address.

Classifying recurring prompt defects in LLM systems
Analyzing how prompts fail to elicit intended LLM behavior
Providing mitigation strategies for prompt engineering challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic survey and taxonomy of prompt defects
Organizes defects along six refined dimensions
Distills mitigation strategies for each defect subtype
🔎 Similar Papers
No similar papers found.
Haoye Tian
Haoye Tian
Assistant Professor, Aalto University
Software EngineeringMachine LearningProgram RepairAI4SELLM4SE
C
Chong Wang
School of Computer Science and Engineering, Nanyang Technological University, Singapore
B
BoYang Yang
Jisuan Institute of Technology, Beijing JudaoYouda Network Technology Co. Ltd., China
Lyuye Zhang
Lyuye Zhang
Postdoc, Nanyang Technological University
Program AnalysisOpen sourceOpen source securitySoftware supply chainSoftware maintenace
Y
Yang Liu
School of Computer Science and Engineering, Nanyang Technological University, Singapore