A Taxonomy of Prompt Defects in LLM Systems

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the empirical and often unreliable nature of large language model (LLM) prompt design, which frequently leads to unsafe or erroneous behaviors. We propose the first systematic taxonomy of prompt defects tailored to software engineering, grounded in empirical analysis and root-cause modeling. The taxonomy spans six dimensions: specification intent, input content, structural format, contextual memory, performance efficiency, and maintainability—establishing explicit mappings among defect types, their impacts, and corresponding mitigation strategies. Building upon this foundation, we integrate prompt engineering patterns, automated safeguards, testing frameworks, and evaluation tools into an end-to-end defect mitigation system. Our contribution is the first comprehensive knowledge framework for ensuring LLM prompt reliability, enabling rigorous, engineering-driven design and verification of trustworthy LLM-based systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have become key components of modern software, with prompts acting as their de-facto programming interface. However, prompt design remains largely empirical and small mistakes can cascade into unreliable, insecure, or inefficient behavior. This paper presents the first systematic survey and taxonomy of prompt defects, recurring ways that prompts fail to elicit their intended behavior from LLMs. We organize defects along six dimensions: (1) Specification and Intent, (2) Input and Content, (3) Structure and Formatting, (4) Context and Memory, (5) Performance and Efficiency, and (6) Maintainability and Engineering. Each dimension is refined into fine-grained subtypes, illustrated with concrete examples and root cause analysis. Grounded in software engineering principles, we show how these defects surface in real development workflows and examine their downstream effects. For every subtype, we distill mitigation strategies that span emerging prompt engineering patterns, automated guardrails, testing harnesses, and evaluation frameworks. We then summarize these strategies in a master taxonomy that links defect, impact, and remedy. We conclude with open research challenges and a call for rigorous engineering-oriented methodologies to ensure that LLM-driven systems are dependable by design.

Problem

Research questions and friction points this paper is trying to address.

Classifying recurring prompt defects in LLM systems

Analyzing how prompts fail to elicit intended LLM behavior

Providing mitigation strategies for prompt engineering challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic survey and taxonomy of prompt defects

Organizes defects along six refined dimensions

Distills mitigation strategies for each defect subtype

🔎 Similar Papers

No similar papers found.