Enhancing Repository-Level Code Generation with Call Chain-Aware Multi-View Context

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing repository-level code generation methods struggle to precisely identify relevant contextual information, and their prompt construction neglects structural code relationships, thereby limiting large language models’ semantic understanding. To address this, we propose RepoScope—a framework that constructs a call-chain-aware structured semantic graph via static analysis to enable multi-perspective context fusion. It introduces a novel four-dimensional context modeling scheme coupled with a call-chain prediction mechanism, and employs a structure-preserving serialization strategy. Crucially, RepoScope operates in a zero-training, single-query setting, significantly enhancing both context completeness and accuracy. Evaluated on the CoderEval and DevEval benchmarks, it achieves up to a 36.35% absolute improvement in pass@1 over state-of-the-art methods, demonstrating its effectiveness and generalizability across diverse coding tasks.

Technology Category

Application Category

📝 Abstract

Repository-level code generation aims to generate code within the context of a specified repository. Existing approaches typically employ retrieval-augmented generation (RAG) techniques to provide LLMs with relevant contextual information extracted from the repository. However, these approaches often struggle with effectively identifying truly relevant contexts that capture the rich semantics of the repository, and their contextual perspectives remains narrow. Moreover, most approaches fail to account for the structural relationships in the retrieved code during prompt construction, hindering the LLM's ability to accurately interpret the context. To address these issues, we propose RepoScope, which leverages call chain-aware multi-view context for repository-level code generation. RepoScope constructs a Repository Structural Semantic Graph (RSSG) and retrieves a comprehensive four-view context, integrating both structural and similarity-based contexts. We propose a novel call chain prediction method that utilizes the repository's structural semantics to improve the identification of callees in the target function. Additionally, we present a structure-preserving serialization algorithm for prompt construction, ensuring the coherence of the context for the LLM. Notably, RepoScope relies solely on static analysis, eliminating the need for additional training or multiple LLM queries, thus ensuring both efficiency and generalizability. Evaluation on widely-used repository-level code generation benchmarks (CoderEval and DevEval) demonstrates that RepoScope outperforms state-of-the-art methods, achieving up to a 36.35% relative improvement in pass@1 scores. Further experiments emphasize RepoScope's potential to improve code generation across different tasks and its ability to integrate effectively with existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Improving repository-level code generation with multi-view context

Enhancing context relevance using call chain-aware structural semantics

Optimizing prompt construction for better LLM interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses call chain-aware multi-view context

Constructs Repository Structural Semantic Graph

Employs structure-preserving serialization algorithm

🔎 Similar Papers

Enhancing Repository-Level Code Generation with Integrated Contextual Information

2024-06-05arXiv.orgCitations: 10

On the Impacts of Contexts on Repository-Level Code Generation

2024-06-17Citations: 1

Authors to Follow