🤖 AI Summary
This study addresses the lack of systematic understanding regarding the long-term, multidimensional impact of generative AI on productivity in industrial-scale agile teams. Conducting a 13-month longitudinal multi-case investigation, we integrate development telemetry from Jira, Git, and SonarQube with qualitative survey data within the SPACE multidimensional productivity framework. Our findings reveal, for the first time, that generative AI enhances productivity primarily by increasing the value density of work—rather than by amplifying activity volume. Results demonstrate significant improvements in team performance and developers’ perceived efficiency, while overall activity levels remain stable. These outcomes underscore the necessity of multidimensional evaluation frameworks and provide empirical validation of generative AI’s tangible efficacy in real-world agile software development environments.
📝 Abstract
Context: Generative Artificial Intelligence (GenAI) tools, such as GitHub Copilot and GPT tools, represent a paradigm shift in software engineering. While their impact is clear, most studies are short-term, focused on individual experiments. The sustained, team-level effects on productivity within industrial agile environments remain largely uncharacterized. Goal: This study aims to provide a longitudinal evaluation of GenAI's impact on agile software teams. We characterize its effect on developers'productivity by applying the multi-dimensional SPACE framework. Method: We conducted a multi-case longitudinal study involving 3 agile teams at a large technology consulting firm for around 13 months. We collected and compared quantitative telemetry (Jira, SonarQube, Git) and qualitative survey data from historical (pre-adoption) and research (post-adoption) sprints. Conclusion: GenAI tools can significantly improve team performance and well-being. Our key finding is a sharp increase in Performance and perceived Efficiency concurrent with flat developer Activity. This suggests GenAI increases the value density of development work, not its volume. This finding validates the necessity of multi-dimensional frameworks like SPACE to capture the true, nuanced impact of GenAI in situ, which would be invisible to studies measuring Activity alone.