CoRet: Improved Retriever for Code Editing

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low recall in retrieving relevant code snippets from large codebases given natural language queries (e.g., feature implementation or bug fixes), this paper proposes a repository-level dense retrieval model. Methodologically, it is the first to jointly model code semantics (via AST-enhanced embeddings), directory structure (through path-aware encoding), and cross-file call dependencies (via call-graph propagation), and introduces a repository-granularity contrastive learning loss. Contributions include: (1) the first loss function explicitly designed for repository-level code retrieval; (2) long-range, context-sensitive code localization that captures inter-file dependencies; and (3) state-of-the-art performance—achieving ≥15 percentage-point higher recall than prior methods on the SWE-bench and Long Code Arena bug-localization benchmarks. Ablation studies confirm the critical role of each component.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce CoRet, a dense retrieval model designed for code-editing tasks that integrates code semantics, repository structure, and call graph dependencies. The model focuses on retrieving relevant portions of a code repository based on natural language queries such as requests to implement new features or fix bugs. These retrieved code chunks can then be presented to a user or to a second code-editing model or agent. To train CoRet, we propose a loss function explicitly designed for repository-level retrieval. On SWE-bench and Long Code Arena's bug localisation datasets, we show that our model substantially improves retrieval recall by at least 15 percentage points over existing models, and ablate the design choices to show their importance in achieving these results.
Problem

Research questions and friction points this paper is trying to address.

Improves retrieval for code-editing tasks
Integrates code semantics and repository structure
Enhances bug localization and feature implementation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense retrieval model for code-editing tasks
Integrates code semantics and repository structure
Novel loss function for repository-level retrieval
🔎 Similar Papers
No similar papers found.