Impact-driven Context Filtering For Cross-file Code Completion

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

In cross-file code completion, retrieved contextual snippets exhibit heterogeneous quality, with some degrading model performance. To address this, we propose CODEFILTER—the first influence-aware context filtering framework. Our approach identifies and removes harmful context snippets adaptively via a likelihood-based influence assessment mechanism and supervised context classification. To support this, we construct the first repository-level influence-annotated dataset. CODEFILTER operates within the RAG paradigm to enable efficient and interpretable context refinement. Experiments demonstrate that CODEFILTER significantly improves code completion accuracy on RepoEval and CrossCodeLongEval, reduces average input length by 32%, lowers computational overhead, and exhibits strong generalization across diverse large language models.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation. To better understand the contribution of the retrieved cross-file contexts, we introduce a likelihood-based metric to evaluate the impact of each retrieved code chunk on the completion. Our analysis reveals that, despite retrieving numerous chunks, only a small subset positively contributes to the completion, while some chunks even degrade performance. To address this issue, we leverage this metric to construct a repository-level dataset where each retrieved chunk is labeled as positive, neutral, or negative based on its relevance to the target completion. We then propose an adaptive retrieval context filtering framework, CODEFILTER, trained on this dataset to mitigate the harmful effects of negative retrieved contexts in code completion. Extensive evaluation on the RepoEval and CrossCodeLongEval benchmarks demonstrates that CODEFILTER consistently improves completion accuracy compared to approaches without filtering operations across various tasks. Additionally, CODEFILTER significantly reduces the length of the input prompt, enhancing computational efficiency while exhibiting strong generalizability across different models. These results underscore the potential of CODEFILTER to enhance the accuracy, efficiency, and attributability of repository-level code completion.

Problem

Research questions and friction points this paper is trying to address.

Evaluating impact of retrieved cross-file code chunks on completion accuracy

Filtering irrelevant or harmful code chunks to improve completion performance

Reducing input prompt length for better computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Impact-driven metric evaluates cross-file context relevance

CODEFILTER framework filters negative retrieval contexts

Reduces prompt length and boosts computational efficiency

🔎 Similar Papers

No similar papers found.