From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering

📅 2025-04-27

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study addresses the time-consuming, labor-intensive nature of stakeholder text analysis in requirements engineering (RE). It presents the first systematic evaluation of GPT-4, Mistral, and LLaMA-2 for qualitative data analysis (QDA), assessing both inductive (zero-shot) and deductive (few-shot) coding capabilities. A structured prompting strategy is introduced, substantially improving inter-coder consistency in deductive coding: GPT-4 achieves a Cohen’s Kappa of 0.71 under few-shot settings—approaching human-level agreement—with high run-to-run stability. Furthermore, the approach enables automated mapping from requirement labels to domain model classes, enhancing traceability and supporting structured modeling. Results demonstrate that large language models (LLMs) can significantly reduce manual annotation effort while delivering efficient, reliable, and reproducible automation for QDA in RE—establishing a novel, scalable paradigm for requirements analysis.

Technology Category

Application Category

📝 Abstract

Requirements Engineering (RE) is essential for developing complex and regulated software projects. Given the challenges in transforming stakeholder inputs into consistent software designs, Qualitative Data Analysis (QDA) provides a systematic approach to handling free-form data. However, traditional QDA methods are time-consuming and heavily reliant on manual effort. In this paper, we explore the use of Large Language Models (LLMs), including GPT-4, Mistral, and LLaMA-2, to improve QDA tasks in RE. Our study evaluates LLMs' performance in inductive (zero-shot) and deductive (one-shot, few-shot) annotation tasks, revealing that GPT-4 achieves substantial agreement with human analysts in deductive settings, with Cohen's Kappa scores exceeding 0.7, while zero-shot performance remains limited. Detailed, context-rich prompts significantly improve annotation accuracy and consistency, particularly in deductive scenarios, and GPT-4 demonstrates high reliability across repeated runs. These findings highlight the potential of LLMs to support QDA in RE by reducing manual effort while maintaining annotation quality. The structured labels automatically provide traceability of requirements and can be directly utilized as classes in domain models, facilitating systematic software design.

Problem

Research questions and friction points this paper is trying to address.

Automating qualitative data analysis in requirements engineering using LLMs

Reducing manual effort in transforming stakeholder inputs to software designs

Evaluating LLM performance in inductive and deductive annotation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs like GPT-4 for qualitative data analysis

Evaluates inductive and deductive annotation tasks

Improves accuracy with context-rich prompts

🔎 Similar Papers

Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis