🤖 AI Summary
Current large language models struggle to generate complete, multi-file code repositories from scratch, primarily because natural language descriptions inadequately capture cross-file and cross-module structural dependencies and interfaces. To address this, we propose the Repository Planning Graph (RPG)—the first graph-based formalism unifying *proposal-level* specifications (functional requirements) and *implementation-level* abstractions (file organization, data flow). RPG enables progressive refinement, scalable modeling, and collaborative reasoning with LLMs. We introduce a three-stage framework: graph construction, graph refinement, and graph-guided code generation—integrating graph representation learning with test-driven validation. Evaluated on RepoCraft (1,052 tasks), RPG generates repositories averaging 36K lines of code, achieving 81.5% functional coverage and 69.7% pass rate—surpassing Claude Code by +27.3 and +35.8 percentage points, respectively.
📝 Abstract
Large language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software structures. To address this, we introduce the Repository Planning Graph (RPG), a persistent representation that unifies proposal- and implementation-level planning by encoding capabilities, file structures, data flows, and functions in one graph. RPG replaces ambiguous natural language with an explicit blueprint, enabling long-horizon planning and scalable repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework for repository generation from scratch. It operates in three stages: proposal-level planning and implementation-level refinement to construct the graph, followed by graph-guided code generation with test validation. To evaluate this setting, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces repositories averaging nearly 36K LOC, roughly 3.9$ imes$ the strongest baseline (Claude Code) and about 64$ imes$ other baselines. It attains 81.5% functional coverage and a 69.7% pass rate, exceeding Claude Code by 27.3 and 35.8 percentage points, respectively. Further analysis shows that RPG models complex dependencies, enables progressively more sophisticated planning through near-linear scaling, and enhances LLM understanding of repositories, thereby accelerating agent localization.