What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the pervasive “ill-defined requirements” problem in natural language prompts for LLM-based software development—where developer-authored prompts fail to adequately capture users’ critical requirements, resulting in brittle model behavior that degrades significantly under model updates or minor prompt perturbations (up to 20% accuracy drop; degradation probability twice the baseline). To tackle this, we propose a *requirements-aware prompt optimization* paradigm, moving beyond conventional heuristic prompt engineering. We introduce a full-lifecycle prompt management framework covering requirement elicitation, evaluation, and continuous monitoring. Additionally, we design an evidence-driven prompt robustness benchmark, a constraint-aware optimization algorithm, and a multi-dimensional stress-testing protocol. Empirical evaluation demonstrates an average 4.8% improvement in task accuracy, alongside substantial gains in prompt stability and cross-context generalization.

Technology Category

Application Category

📝 Abstract
Building LLM-powered software requires developers to communicate their requirements through natural language, but developer prompts are frequently underspecified, failing to fully capture many user-important requirements. In this paper, we present an in-depth analysis of prompt underspecification, showing that while LLMs can often (41.1%) guess unspecified requirements by default, such behavior is less robust: Underspecified prompts are 2x more likely to regress over model or prompt changes, sometimes with accuracy drops by more than 20%. We then demonstrate that simply adding more requirements to a prompt does not reliably improve performance, due to LLMs' limited instruction-following capabilities and competing constraints, and standard prompt optimizers do not offer much help. To address this, we introduce novel requirements-aware prompt optimization mechanisms that can improve performance by 4.8% on average over baselines that naively specify everything in the prompt. Beyond prompt optimization, we envision that effectively managing prompt underspecification requires a broader process, including proactive requirements discovery, evaluation, and monitoring.
Problem

Research questions and friction points this paper is trying to address.

Understanding underspecification in LLM prompts and its impact
Addressing limited effectiveness of adding more requirements to prompts
Introducing requirements-aware optimization to improve prompt performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze prompt underspecification impact on LLMs
Introduce requirements-aware prompt optimization mechanisms
Propose proactive requirements discovery and monitoring
🔎 Similar Papers
No similar papers found.
Chenyang Yang
Chenyang Yang
Carnegie Mellon University
Software EngineeringSE4AIHuman-AI Interaction
Y
Yike Shi
Carnegie Mellon University
Q
Qianou Ma
Carnegie Mellon University
Michael Xieyang Liu
Michael Xieyang Liu
Research Scientist, Google DeepMind
Human-AI InteractionSensemakingEnd-user ProgrammingHuman Computer Interaction
C
Christian Kastner
Carnegie Mellon University
T
Tongshuang Wu
Carnegie Mellon University