🤖 AI Summary
Existing automated code summarization and requirement tracing methods target developers and ignore software evolution, thus failing to support end users in verifying whether AI-generated software aligns with their intent.
Method: We propose the first multi-agent framework for user-level requirement (UR) generation and dynamic traceability chain recovery. It orchestrates four specialized agents—reviewing, searching, writing, and verifying—that jointly leverage project evolution history and domain context to progressively perform dependency structuring, implementation-level requirement derivation, and UR synthesis.
Contribution/Results: Our approach balances requirement completeness and evolution sensitivity. Empirical evaluation shows it outperforms five state-of-the-art baselines in UR completeness, accuracy, and trace precision. A user study confirms it significantly enhances users’ ability to validate intent alignment of AI-generated software.
📝 Abstract
Software maintainability critically depends on high-quality requirements descriptions and explicit traceability between requirements and code. Although automated code summarization (ACS) and requirements traceability (RT) techniques have been widely studied, existing ACS methods mainly generate implementation-level (i.e., developer-oriented) requirements (IRs) for fine-grained units (e.g., methods), while RT techniques often overlook the impact of project evolution. As a result, user-level (i.e., end user-oriented) requirements (URs) and live trace links remain underexplored, despite their importance for supporting user understanding and for validating whether AI-generated software aligns with user intent. To address this gap, we propose UserTrace, a multi-agent system that automatically generates URs and recovers live trace links (from URs to IRs to code) from software repositories. UserTrace coordinates four specialized agents (i.e., Code Reviewer, Searcher, Writer, and Verifier) through a three-phase process: structuring repository dependencies, deriving IRs for code units, and synthesizing URs with domain-specific context. Our comparative evaluation shows that UserTrace produces URs with higher completeness, correctness, and helpfulness than an established baseline, and achieves superior precision in trace link recovery compared to five state-of-the-art RT approaches. A user study further demonstrates that UserTrace helps end users validate whether the AI-generated repositories align with their intent.