RAMBO: Enhancing RAG-based Repository-Level Method Body Completion

📅 2024-09-23

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

130K/year

🤖 AI Summary

To address the challenges of cross-module dependencies, custom API usage, and project-specific conventions in Method Body Completion (MBC) for large-scale code repositories, this paper proposes a repository-level semantic-context-aware Retrieval-Augmented Generation (RAG) framework. Unlike conventional approaches that retrieve based on method body similarity, our method innovatively retrieves key code elements—classes, methods, and fields—along with their actual invocation contexts within the repository. It integrates semantic-aware element identification, context-aware usage aggregation, and collaborative generation with large language models. Experiments across 40 Java projects demonstrate substantial improvements: BLEU increases by 46%, CodeBLEU by 57%, compilation success rate by 36%, and exact match accuracy reaches three times that of the baseline. Our approach significantly outperforms RepoCoder Oracle, achieving a +12% gain in overall effectiveness.

Technology Category

Application Category

📝 Abstract

Code completion is essential in software development, helping developers by predicting code snippets based on context. Among completion tasks, Method Body Completion (MBC) is particularly challenging as it involves generating complete method bodies based on their signatures and context. This task becomes significantly harder in large repositories, where method bodies must integrate repositoryspecific elements such as custom APIs, inter-module dependencies, and project-specific conventions. In this paper, we introduce RAMBO, a novel RAG-based approach for repository-level MBC. Instead of retrieving similar method bodies, RAMBO identifies essential repository-specific elements, such as classes, methods, and variables/fields, and their relevant usages. By incorporating these elements and their relevant usages into the code generation process, RAMBO ensures more accurate and contextually relevant method bodies. Our experimental results with leading code LLMs across 40 Java projects show that RAMBO significantly outperformed the state-of-the-art repository-level MBC approaches, with the improvements of up to 46% in BLEU, 57% in CodeBLEU, 36% in Compilation Rate, and up to 3X in Exact Match. Notably, RAMBO surpassed RepoCoder Oracle method by up to 12% in Exact Match, setting a new benchmark for repository-level MBC.

Problem

Research questions and friction points this paper is trying to address.

Enhancing method body completion in large code repositories

Integrating repository-specific elements for accurate code generation

Improving performance over existing repository-level completion approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAMBO uses RAG for method body completion

Retrieves repository-specific elements for context

Integrates custom APIs and dependencies accurately

🔎 Similar Papers

No similar papers found.