One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Precise localization of target files and functions in large open-source codebases is challenging due to scale, structural complexity, and implicit execution logic. Existing LLM-based approaches rely on multi-tool orchestration, neglect execution flow, and suffer from high control complexity. Method: We propose RepoNavigator—the first LLM agent grounded in a *single-symbol-definition-jump tool* and *execution-logic-driven reasoning*. It explicitly models code execution flow as a sequence of tool invocations and introduces the first end-to-end PPO-based reinforcement learning framework for pretrained models—eliminating reliance on closed-source distillation. The method integrates repository-level contextual modeling with a lightweight tool state machine. Results: Experiments show that RepoNavigator’s 7B model outperforms a 14B baseline; its 14B variant surpasses a 32B competing method; and its 32B model achieves SOTA across multiple defect-localization benchmarks, significantly improving both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Locating the files and functions requiring modification in large open-source software (OSS) repositories is challenging due to their scale and structural complexity. Existing large language model (LLM)-based methods typically treat this as a repository-level retrieval task and rely on multiple auxiliary tools, which overlook code execution logic and complicate model control. We propose RepoNavigator, an LLM agent equipped with a single execution-aware tool-jumping to the definition of an invoked symbol. This unified design reflects the actual flow of code execution while simplifying tool manipulation. RepoNavigator is trained end-to-end via Reinforcement Learning (RL) directly from a pretrained model, without any closed-source distillation. Experiments demonstrate that RL-trained RepoNavigator achieves state-of-the-art performance, with the 7B model outperforming 14B baselines, the 14B model surpassing 32B competitors, and even the 32B model exceeding closed-source models such as Claude-3.7. These results confirm that integrating a single, structurally grounded tool with RL training provides an efficient and scalable solution for repository-level issue localization.

Problem

Research questions and friction points this paper is trying to address.

Locating files and functions needing modification in large OSS repositories

Overcoming scale and structural complexity in repository-level retrieval

Simplifying tool use while reflecting code execution logic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single execution-aware tool for code navigation

End-to-end reinforcement learning from pretrained models

Unified design reflecting actual code execution flow

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents