Context-Adaptive Requirements Defect Prediction through Human-LLM Collaboration

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

This work proposes an adaptive defect prediction mechanism grounded in human feedback to address the contextual variability of “defect” definitions across projects, domains, and stakeholders—a limitation of traditional approaches that rely on generic patterns. The method models defect prediction as a continuous learning loop between a large language model (LLM) and the user, leveraging chain-of-thought reasoning to produce interpretable predictions and dynamically refining few-shot prompts using a small set of user-validated examples. By moving beyond static classification paradigms, the approach achieves significantly better performance than standard few-shot prompting and fine-tuned BERT models on the QuRE benchmark with only 20 labeled samples, while maintaining high recall and delivering context-sensitive, explainable predictions.

Technology Category

Application Category

📝 Abstract

Automated requirements assessment traditionally relies on universal patterns as proxies for defectiveness, implemented through rule-based heuristics or machine learning classifiers trained on large annotated datasets. However, what constitutes a"defect"is inherently context-dependent and varies across projects, domains, and stakeholder interpretations. In this paper, we propose a Human-LLM Collaboration (HLC) approach that treats defect prediction as an adaptive process rather than a static classification task. HLC leverages LLM Chain-of-Thought reasoning in a feedback loop: users validate predictions alongside their explanations, and these validated examples adaptively guide future predictions through few-shot learning. We evaluate this approach using the weak word smell on the QuRE benchmark of 1,266 annotated Mercedes-Benz requirements. Our results show that HLC effectively adapts to the provision of validated examples, with rapid performance gains from as few as 20 validated examples. Incorporating validated explanations, not just labels, enables HLC to substantially outperform both standard few-shot prompting and fine-tuned BERT models while maintaining high recall. These results highlight how the in-context and Chain-of-Thought learning capabilities of LLMs enable adaptive classification approaches that move beyond one-size-fits-all models, creating opportunities for tools that learn continuously from stakeholder feedback.

Problem

Research questions and friction points this paper is trying to address.

requirements defect prediction

context-adaptive

human-LLM collaboration

defectiveness

stakeholder feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-LLM Collaboration

Chain-of-Thought Reasoning

Context-Adaptive Prediction