AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

📅 2025-11-16

🏛️ International Conference on Automated Software Engineering

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the limitations of current large code models in repository-level code completion, which struggle to effectively leverage repository-specific context and domain knowledge. Traditional retrieval-augmented approaches often suffer from semantic mismatches between queries and target code and neglect reasoning information, leading to suboptimal performance. To overcome these challenges, we propose AlignCoder, a novel framework that constructs enhanced queries by generating multiple candidate completions to bridge the semantic gap. Furthermore, we introduce AlignRetriever, a reinforcement learning–driven retriever that exploits reasoning cues embedded in the candidates to enable more accurate cross-file retrieval. Evaluated on CrossCodeEval and RepoEval, our method significantly outperforms existing baselines, achieving an 18.1% absolute improvement in exact match (EM) scores and demonstrating strong generalization across diverse code large language models and programming languages.

Technology Category

Application Category

📝 Abstract

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.

Problem

Research questions and friction points this paper is trying to address.

repository-level code completion

retrieval-augmented generation

query-target misalignment

inference information utilization

code LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation

query enhancement

reinforcement learning