🤖 AI Summary
Developers spend approximately 58% of their time understanding codebases, yet current large language models (LLMs) generate only function-level documentation and fail to capture repository-scale architectural patterns and cross-module interactions. Method: We propose the first fully automated, multilingual (seven languages) repository-level documentation generation framework. It employs hierarchical decomposition, recursive agent collaboration, and dynamic delegation to precisely model system architecture and data flow, simultaneously producing natural-language descriptions and visual artifacts—including architecture diagrams and data flow diagrams. The framework integrates multimodal content synthesis and introduces CodeWikiBench, a dedicated evaluation benchmark for automated quality assessment. Contribution/Results: On CodeWikiBench, closed-source and open-source models achieve quality scores of 68.79% and 64.80%, respectively—significantly outperforming existing baselines—demonstrating superior scalability and accuracy in repository-level documentation generation.
📝 Abstract
Developers spend nearly 58% of their time understanding codebases, yet maintaining comprehensive documentation remains challenging due to complexity and manual effort. While recent Large Language Models (LLMs) show promise for function-level documentation, they fail at the repository level, where capturing architectural patterns and cross-module interactions is essential. We introduce CodeWiki, the first open-source framework for holistic repository-level documentation across seven programming languages. CodeWiki employs three innovations: (i) hierarchical decomposition that preserves architectural context, (ii) recursive agentic processing with dynamic delegation, and (iii) synthesis of textual and visual artifacts including architecture diagrams and data flows. We also present CodeWikiBench, the first repository-level documentation benchmark with multi-level rubrics and agentic assessment. CodeWiki achieves 68.79% quality score with proprietary models and 64.80% with open-source alternatives, outperforming existing closed-source systems and demonstrating scalable, accurate documentation for real-world repositories.