A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In agile development, ambiguously defined epics frequently lead to requirement rework, delivery delays, and cost overruns. To address this, this paper presents the first empirical study investigating the potential of large language models (LLMs) for automating epic quality assessment. Leveraging an industry case study and a user survey with 17 product managers, we propose an LLM-driven evaluation paradigm integrable into real-world workflows. Our approach systematically identifies key quality dimensions—completeness, testability, and stakeholder alignment—and delineates human-AI collaboration pathways for iterative refinement. Results demonstrate significant improvements in assessment efficiency and consistency, high user satisfaction, and effective support for early-stage requirement governance. This work bridges an empirical gap in AI-augmented agile requirements engineering and provides both methodological foundations and practical evidence for deploying LLMs in software process improvement.

Technology Category

Application Category

📝 Abstract
The broad availability of generative AI offers new opportunities to support various work domains, including agile software development. Agile epics are a key artifact for product managers to communicate requirements to stakeholders. However, in practice, they are often poorly defined, leading to churn, delivery delays, and cost overruns. In this industry case study, we investigate opportunities for large language models (LLMs) to evaluate agile epic quality in a global company. Results from a user study with 17 product managers indicate how LLM evaluations could be integrated into their work practices, including perceived values and usage in improving their epics. High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations. However, our findings also outline challenges, limitations, and adoption barriers that can inform both practitioners and researchers on the integration of such evaluations into future agile work practices.
Problem

Research questions and friction points this paper is trying to address.

Investigating generative AI's role in evaluating agile epic quality
Addressing poor epic definitions causing delays and cost overruns
Exploring LLM integration challenges in agile work practices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to evaluate agile epic quality
Integrating AI evaluations into product management
Addressing challenges in AI adoption for agile
🔎 Similar Papers
No similar papers found.
Werner Geyer
Werner Geyer
Chief Scientist Human-Centered Trustworthy AI & Principal Research Scientist, IBM Research
Human-Centered AIHCICSCWAIRecommender Systems
J
Jessica He
IBM Research, Seattle, WA, USA
D
Daita Sarkar
IBM, Kochi, India
M
Michelle Brachman
IBM Research, Cambridge, MA, USA
C
Chris Hammond
IBM, Austin, TX, USA
J
Jennifer Heins
IBM, Durham, NC, USA
Zahra Ashktorab
Zahra Ashktorab
IBM Research
Human Computer Interaction
C
Carlos Rosemberg
IBM, Canada
C
Charlie Hill
IBM, Cambridge, MA, USA