🤖 AI Summary
While large language models (LLMs) are increasingly deployed in software engineering, prompts—emerging as critical engineering artifacts—lack systematic practices and theoretical foundations for development, maintenance, and reuse. Method: This study formally defines prompts as core software engineering artifacts and establishes a theory-grounded framework covering prompt development, traceability, and evolution. Drawing on an exploratory qualitative study with 74 practitioners across multiple countries, we analyze current practices and challenges. Contribution/Results: We reveal that prompt usage remains highly ad hoc, characterized by low reusability, poor documentation, and fragmented versioning. Prompt management is neither institutionalized nor standardized, leading to knowledge loss and maintainability issues. Despite significant potential benefits of systematic prompt management, practitioners face trade-offs between efficacy and implementation overhead. Our contributions include the first prompt lifecycle model tailored to software engineering and an empirically informed, preliminary set of prompt management guidelines.
📝 Abstract
Developers now routinely interact with large language models (LLMs) to support a range of software engineering (SE) tasks. This prominent role positions prompts as potential SE artifacts that, like other artifacts, may require systematic development, documentation, and maintenance. However, little is known about how prompts are actually used and managed in LLM-integrated workflows, what challenges practitioners face, and whether the benefits of systematic prompt management outweigh the associated effort. To address this gap, we propose a research programme that (a) characterizes current prompt practices, challenges, and influencing factors in SE; (b) analyzes prompts as software artifacts, examining their evolution, traceability, reuse, and the trade-offs of systematic management; and (c) develops and empirically evaluates evidence-based guidelines for managing prompts in LLM-integrated workflows. As a first step, we conducted an exploratory survey with 74 software professionals from six countries to investigate current prompt practices and challenges. The findings reveal that prompt usage in SE is largely ad-hoc: prompts are often refined through trial-and-error, rarely reused, and shaped more by individual heuristics than standardized practices. These insights not only highlight the need for more systematic approaches to prompt management but also provide the empirical foundation for the subsequent stages of our research programme.