🤖 AI Summary
To address the prevalent issues of incompleteness, unreliability, and factual inaccuracies in LLM-generated code documentation, this paper proposes the first topology-aware, multi-agent collaborative documentation generation framework. Methodologically: (1) it introduces a novel topology-driven incremental context construction mechanism, dynamically modeling code structure via Program Dependence Graphs (PDGs); (2) it designs a five-role collaborative architecture—Reader, Searcher, Writer, Verifier, and Orchestrator—integrating modular prompt engineering with multi-stage verification; (3) it establishes a comprehensive evaluation framework spanning completeness, practicality, and truthfulness. Our approach achieves significant improvements over state-of-the-art methods across multiple real-world codebases. Ablation studies demonstrate that topology-aware processing order boosts truthfulness by 37.2%. Moreover, the framework exhibits robust performance on complex, private repositories, enabling reliable, high-fidelity documentation generation.
📝 Abstract
High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete, unhelpful, or factually incorrect outputs. We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building. Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation. We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness. Comprehensive experiments show DocAgent significantly outperforms baselines consistently. Our ablation study confirms the vital role of the topological processing order. DocAgent offers a robust approach for reliable code documentation generation in complex and proprietary repositories.