🤖 AI Summary
This study addresses the time-consuming, labor-intensive nature of stakeholder text analysis in requirements engineering (RE). It presents the first systematic evaluation of GPT-4, Mistral, and LLaMA-2 for qualitative data analysis (QDA), assessing both inductive (zero-shot) and deductive (few-shot) coding capabilities. A structured prompting strategy is introduced, substantially improving inter-coder consistency in deductive coding: GPT-4 achieves a Cohen’s Kappa of 0.71 under few-shot settings—approaching human-level agreement—with high run-to-run stability. Furthermore, the approach enables automated mapping from requirement labels to domain model classes, enhancing traceability and supporting structured modeling. Results demonstrate that large language models (LLMs) can significantly reduce manual annotation effort while delivering efficient, reliable, and reproducible automation for QDA in RE—establishing a novel, scalable paradigm for requirements analysis.
📝 Abstract
Requirements Engineering (RE) is essential for developing complex and regulated software projects. Given the challenges in transforming stakeholder inputs into consistent software designs, Qualitative Data Analysis (QDA) provides a systematic approach to handling free-form data. However, traditional QDA methods are time-consuming and heavily reliant on manual effort. In this paper, we explore the use of Large Language Models (LLMs), including GPT-4, Mistral, and LLaMA-2, to improve QDA tasks in RE. Our study evaluates LLMs' performance in inductive (zero-shot) and deductive (one-shot, few-shot) annotation tasks, revealing that GPT-4 achieves substantial agreement with human analysts in deductive settings, with Cohen's Kappa scores exceeding 0.7, while zero-shot performance remains limited. Detailed, context-rich prompts significantly improve annotation accuracy and consistency, particularly in deductive scenarios, and GPT-4 demonstrates high reliability across repeated runs. These findings highlight the potential of LLMs to support QDA in RE by reducing manual effort while maintaining annotation quality. The structured labels automatically provide traceability of requirements and can be directly utilized as classes in domain models, facilitating systematic software design.