Improving Code Localization with Repository Memory

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing code localization methods ignore developers’ long-term memory of codebases—such as module functionality and defect–fix location associations—forcing task-specific reasoning from scratch. Method: We propose a non-parametric, repository-level long-term memory mechanism grounded in commit history. By analyzing historical commits, issue–commit links, and active module summaries, it automatically discovers code evolution patterns and defect distribution regularities, constructing a retrievable external memory store. This memory is integrated into the LocAgent framework for the first time, enabling memory-augmented localization reasoning. Contribution/Results: Our approach achieves significant improvements over state-of-the-art methods on both SWE-bench-verified and SWE-bench-live benchmarks, empirically demonstrating that long-term memory is critical for enhancing performance in complex software engineering tasks.

Technology Category

Application Category

📝 Abstract

Code localization is a fundamental challenge in repository-level software engineering tasks such as bug fixing. While existing methods equip language agents with comprehensive tools/interfaces to fetch information from the repository, they overlook the critical aspect of memory, where each instance is typically handled from scratch assuming no prior repository knowledge. In contrast, human developers naturally build long-term repository memory, such as the functionality of key modules and associations between various bug types and their likely fix locations. In this work, we augment language agents with such memory by leveraging a repository's commit history - a rich yet underutilized resource that chronicles the codebase's evolution. We introduce tools that allow the agent to retrieve from a non-parametric memory encompassing recent historical commits and linked issues, as well as functionality summaries of actively evolving parts of the codebase identified via commit patterns. We demonstrate that augmenting such a memory can significantly improve LocAgent, a state-of-the-art localization framework, on both SWE-bench-verified and the more recent SWE-bench-live benchmarks. Our research contributes towards developing agents that can accumulate and leverage past experience for long-horizon tasks, more closely emulating the expertise of human developers.

Problem

Research questions and friction points this paper is trying to address.

Enhancing code localization using repository commit history memory

Addressing the lack of prior knowledge in repository-level software tasks

Improving bug fixing by leveraging historical commits and issue links

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages repository commit history for memory

Retrieves from non-parametric memory of commits

Uses commit patterns to identify active code parts

🔎 Similar Papers

BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning