RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion

πŸ“… 2026-01-06
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge in warehouse-scale code completion where cross-file retrieved context often degrades generation quality due to conflicts or irrelevant content. To mitigate this, the authors propose a Shapley value–based context filtering framework that accurately evaluates and selects only the context truly beneficial for generation, leveraging offline annotations and a coalition-aware mechanism. The core innovation is the ChunkShapley module, which integrates teacher-forced likelihood, surrogate game modeling, exact Shapley value computation, and decoding-based optimal coalition validation, further enhanced by retrieval-augmented generation and bounded post-validation. Experiments demonstrate that the method significantly improves code completion quality across multiple benchmarks and backbone models while reducing the inclusion of harmful context and redundant retrieval overhead.

Technology Category

Application Category

πŸ“ Abstract
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our module ChunkShapley constructs offline labels by (i) single-chunk probing with teacher-forced likelihood to estimate signed, weighted effects, (ii) a surrogate game that captures saturation and interference, (iii) exact Shapley computation for small retrieval sets, and (iv) bounded post-verification that selects a decoding-optimal coalition using the frozen generator. We distill verified $KEEP$ or $DROP$ decisions and retrieval triggering into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval. Code: https://anonymous.4open.science/r/a7f3c9.
Problem

Research questions and friction points this paper is trying to address.

repository-level code completion
retrieval-augmented generation
context filtering
cross-file evidence
harmful context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley value
repository-level code completion
context filtering
retrieval-augmented generation
coalition-aware
πŸ”Ž Similar Papers
No similar papers found.
Y
Yu Huo
School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Longgang, Shenzhen, Guangdong, 518172, P.R. China
Siyu Zhang
Siyu Zhang
4DV.ai
Computer Vision
Kun Zeng
Kun Zeng
Dongfang Electric Corporation Dongfang Boiler Co.,ltd.
magnetic domainNDEmagnetismmagnetic microstructureboiler
Y
Yuquan Lu
Guangdong Provincial Key Laboratory of Future Networks of Intelligence
C
Cheng Yang
Hangzhou Dianzi University
Y
Yifu Guo
Sun Yat-sen University
X
Xiaoying Tang
School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Longgang, Shenzhen, Guangdong, 518172, P.R. China