🤖 AI Summary
This study addresses the fragmented understanding of large language models’ (LLMs) application in requirements engineering (RE). Through a systematic literature review (SLR) of 74 studies published between 2023 and 2024, it analyzes LLM adoption across language-intensive RE tasks—including requirements elicitation, validation, test case generation, and multi-source document analysis. The review examines prompting strategies (e.g., zero-/few-shot), model selections (predominantly GPT-series), evaluation methodologies, and task coverage distribution. Key findings reveal strong concentration on elicitation and validation, uneven coverage across RE activities, prevalent reliance on manual or non-standardized evaluation metrics, and absence of industrial-grade benchmarks. While LLMs demonstrably extend the scope of RE automation, toolchains and datasets remain fragmented. As a novel contribution, this work compiles the first comprehensive inventory of LLM-based tools and datasets tailored for RE. It concludes by proposing three future directions: improving reproducibility, enhancing domain-specific adaptation, and accelerating industrial deployment.
📝 Abstract
Large Language Models (LLMs) are finding applications in numerous domains, and Requirements Engineering (RE) is increasingly benefiting from their capabilities to assist with complex, language-intensive tasks. This paper presents a systematic literature review of 74 primary studies published between 2023 and 2024, examining how LLMs are being applied in RE. The study categorizes the literature according to several dimensions, including publication trends, RE activities, prompting strategies, and evaluation methods. Our findings indicate notable patterns, among which we observe substantial differences compared to previous works leveraging standard Natural Language Processing (NLP) techniques. Most of the studies focus on using LLMs for requirements elicitation and validation, rather than defect detection and classification, which were dominant in the past. Researchers have also broadened their focus and addressed novel tasks, e.g., test generation, exploring the integration of RE with other software engineering (SE) disciplines. Although requirements specifications remain the primary focus, other artifacts are increasingly considered, including issues from issue tracking systems, regulations, and technical manuals. The studies mostly rely on GPT-based models, and often use Zero-shot or Few-shot prompting. They are usually evaluated in controlled environments, with limited use in industry settings and limited integration in complex workflows. Our study outlines important future directions, such as leveraging the potential to expand the influence of RE in SE, exploring less-studied tasks, improving prompting methods, and testing in real-world environments. Our contribution also helps researchers and practitioners use LLMs more effectively in RE, by providing a list of identified tools leveraging LLMs for RE, as well as datasets.