RAMBO: Enhancing RAG-based Repository-Level Method Body Completion

📅 2024-09-23
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of cross-module dependencies, custom API usage, and project-specific conventions in Method Body Completion (MBC) for large-scale code repositories, this paper proposes a repository-level semantic-context-aware Retrieval-Augmented Generation (RAG) framework. Unlike conventional approaches that retrieve based on method body similarity, our method innovatively retrieves key code elements—classes, methods, and fields—along with their actual invocation contexts within the repository. It integrates semantic-aware element identification, context-aware usage aggregation, and collaborative generation with large language models. Experiments across 40 Java projects demonstrate substantial improvements: BLEU increases by 46%, CodeBLEU by 57%, compilation success rate by 36%, and exact match accuracy reaches three times that of the baseline. Our approach significantly outperforms RepoCoder Oracle, achieving a +12% gain in overall effectiveness.

Technology Category

Application Category

📝 Abstract
Code completion is essential in software development, helping developers by predicting code snippets based on context. Among completion tasks, Method Body Completion (MBC) is particularly challenging as it involves generating complete method bodies based on their signatures and context. This task becomes significantly harder in large repositories, where method bodies must integrate repositoryspecific elements such as custom APIs, inter-module dependencies, and project-specific conventions. In this paper, we introduce RAMBO, a novel RAG-based approach for repository-level MBC. Instead of retrieving similar method bodies, RAMBO identifies essential repository-specific elements, such as classes, methods, and variables/fields, and their relevant usages. By incorporating these elements and their relevant usages into the code generation process, RAMBO ensures more accurate and contextually relevant method bodies. Our experimental results with leading code LLMs across 40 Java projects show that RAMBO significantly outperformed the state-of-the-art repository-level MBC approaches, with the improvements of up to 46% in BLEU, 57% in CodeBLEU, 36% in Compilation Rate, and up to 3X in Exact Match. Notably, RAMBO surpassed RepoCoder Oracle method by up to 12% in Exact Match, setting a new benchmark for repository-level MBC.
Problem

Research questions and friction points this paper is trying to address.

Enhancing method body completion in large code repositories
Integrating repository-specific elements for accurate code generation
Improving performance over existing repository-level completion approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAMBO uses RAG for method body completion
Retrieves repository-specific elements for context
Integrates custom APIs and dependencies accurately
🔎 Similar Papers
No similar papers found.
T
Tuan-Dung Bui
Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
D
Duc-Thieu Luu-Van
Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
T
Thanh-Phat Nguyen
Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
Thu-Trang Nguyen
Thu-Trang Nguyen
VNU University of Engineering and Technology
Automated Software EngineeringProgram AnalysisCode GenerationAI
S
Son Nguyen
Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
Hieu Dinh Vo
Hieu Dinh Vo
VNU
Software architectureProgram analysis