RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Current large language models (LLMs) excel at single-file code generation but suffer from insufficient repository-level contextual understanding, hindering their ability to support complex, cross-file and cross-module software development tasks in AI-driven software engineering. To address this, we propose RepoGraph—a novel, fine-grained structured code representation for repositories, the first of its kind. RepoGraph integrates abstract syntax tree (AST) parsing, inter-file reference tracing, and graph neural network (GNN)-enhanced representation learning to enable dynamic dependency modeling and holistic context aggregation. It is designed as a plug-in module compatible with arbitrary LLM inference frameworks and supports standard benchmarks including SWE-bench and CrossCodeEval. Experimental results demonstrate that RepoGraph achieves state-of-the-art performance among open-source frameworks on SWE-bench, outperforming four baseline approaches across all metrics; on CrossCodeEval, it improves average patch success rate by 12.7%, confirming its strong generalization capability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency but also advanced skills in managing and interacting with code repositories. However, existing methods often overlook the need for repository-level code understanding, which is crucial for accurately grasping the broader context and developing effective solutions. On this basis, we present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph offers the desired guidance and serves as a repository-wide navigation for AI software engineers. We evaluate RepoGraph on the SWE-bench by plugging it into four different methods of two lines of approaches, where RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks. Our analyses also demonstrate the extensibility and flexibility of RepoGraph by testing on another repo-level coding benchmark, CrossCodeEval. Our code is available at https://github.com/ozyyshr/RepoGraph.

Problem

Research questions and friction points this paper is trying to address.

Enhances AI software engineering with repository-level code understanding

Addresses limitations of LLMs in managing code repositories

Improves performance in repository-wide navigation and context comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

RepoGraph enhances repository-level code understanding.

Plugin integrates with multiple AI software engineering methods.

Boosts performance, achieving state-of-the-art in open-source frameworks.

🔎 Similar Papers

No similar papers found.