🤖 AI Summary
This study addresses the scarcity of real-world system requirements specifications (SyRSs), which are often inaccessible due to confidentiality and intellectual property constraints, thereby hindering research progress. To overcome this limitation, the authors propose a novel paradigm for generating high-fidelity synthetic requirements without access to authentic samples. By integrating systematic prompt engineering with cross-model validation using black-box large language models such as ChatGPT, they produce 300 synthetic SyRSs spanning ten distinct industries. The generated artifacts undergo iterative refinement through LLM-driven quality assessment and expert surveys. Experimental evaluation reveals that 62% of domain experts deem the synthetic requirements realistic; however, in-depth analysis uncovers persistent logical inconsistencies and defects, underscoring that current LLMs cannot yet fully replace human review in requirements validation.
📝 Abstract
System requirement specifications (SyRSs) are central, natural-language (NL) artifacts. Access to real SyRS for research purposes is highly valuable but limited by proprietary restrictions or confidentiality concerns. Generating synthetic SyRSs (SSyRSs) can address this scarcity. Black-box large language models (LLMs) such as ChatGPT offer compelling generation capabilities by providing easy access to NL generation functions without requiring access to real data. However, LLMs suffer from hallucinations and overconfidence, which pose major challenges in their use. We designed an exploratory study to investigate whether, despite these challenges, we can generate realistic SSyRSs with ChatGPT without having access to real SyRSs. Using a systematic approach that leverages prompt patterns, LLM-based quality assessments, and iterative prompt refinements, we generated 300 SSyRSs across 10 industries with ChatGPT. The results were evaluated using cross-model checks and an expert study, with n=87 submitted surveys. 62\% of experts considered the SSyRSs to be realistic. However, in-depth examination revealed contradictory statements and deficiencies. Overall, we were able to generate realistic SSyRSs to a certain extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations. This paper presents the methodology and results of our study and discusses the key insights we obtained.