CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of code localization—efficiently identifying relevant files, classes, and functions within large codebases—by proposing a reinforcement learning approach that operates exclusively through a standard Unix terminal. Leveraging existing code agent environments and introducing carefully designed reward mechanisms and optimization strategies, the method enables autonomous navigation and retrieval within code repositories without relying on specialized tools. Evaluated on the SWE-Bench Verified, Pro, and Lite benchmarks, the approach matches or significantly outperforms base or post-trained large language models that are 2–18 times larger in scale, achieving performance comparable to closed-source models such as Claude Sonnet. These results demonstrate the method’s efficiency and practical viability for real-world code intelligence tasks.

Technology Category

Application Category

📝 Abstract

A prerequisite for coding agents to perform tasks on large repositories is code localization - the identification of relevant files, classes, and functions to work on. While repository-level code localization has been performed using embedding-based retrieval approaches such as vector search, recent work has focused on developing agents to localize relevant code either as a standalone precursor to or interleaved with performing actual work. Most prior methods on agentic code search equip the agent with complex, specialized tools, such as repository graphs derived from static analysis. In this paper, we demonstrate that, with an effective reinforcement learning recipe, a coding agent equipped with nothing more than a standard Unix terminal can be trained to achieve strong results. Our experiments on three benchmarks (SWE-Bench Verified, Pro, and Lite) reveal that our models consistently achieve superior or competitive performance over 2-18x larger base and post-trained LLMs and sometimes approach performance provided by closed models like Claude Sonnet, even when using specialized scaffolds. Our work particularly focuses on techniques for re-purposing existing coding agent environments for code search, reward design, and RL optimization. We release the resulting model family, CodeScout, along with all our code and data for the community to build upon.

Problem

Research questions and friction points this paper is trying to address.

code search

code localization

reinforcement learning

coding agents

repository navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

code search agent

Unix terminal