🤖 AI Summary
This work addresses the pervasive “ill-defined requirements” problem in natural language prompts for LLM-based software development—where developer-authored prompts fail to adequately capture users’ critical requirements, resulting in brittle model behavior that degrades significantly under model updates or minor prompt perturbations (up to 20% accuracy drop; degradation probability twice the baseline). To tackle this, we propose a *requirements-aware prompt optimization* paradigm, moving beyond conventional heuristic prompt engineering. We introduce a full-lifecycle prompt management framework covering requirement elicitation, evaluation, and continuous monitoring. Additionally, we design an evidence-driven prompt robustness benchmark, a constraint-aware optimization algorithm, and a multi-dimensional stress-testing protocol. Empirical evaluation demonstrates an average 4.8% improvement in task accuracy, alongside substantial gains in prompt stability and cross-context generalization.
📝 Abstract
Building LLM-powered software requires developers to communicate their requirements through natural language, but developer prompts are frequently underspecified, failing to fully capture many user-important requirements. In this paper, we present an in-depth analysis of prompt underspecification, showing that while LLMs can often (41.1%) guess unspecified requirements by default, such behavior is less robust: Underspecified prompts are 2x more likely to regress over model or prompt changes, sometimes with accuracy drops by more than 20%. We then demonstrate that simply adding more requirements to a prompt does not reliably improve performance, due to LLMs' limited instruction-following capabilities and competing constraints, and standard prompt optimizers do not offer much help. To address this, we introduce novel requirements-aware prompt optimization mechanisms that can improve performance by 4.8% on average over baselines that naively specify everything in the prompt. Beyond prompt optimization, we envision that effectively managing prompt underspecification requires a broader process, including proactive requirements discovery, evaluation, and monitoring.