SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of small language models in long-horizon software engineering tasks, where they often fall into action loops and exhibit low repair success rates. The authors reformulate the problem as an expert-apprentice collaboration and introduce a sparse yet effective expert-assisted mechanism: the small model acts as the sole decision-maker and selectively queries an expert only upon detecting stagnation. The approach is optimized through supervised fine-tuning on expert-augmented trajectories and agent-oriented reinforcement learning, complemented by an action-loop suppression strategy. Evaluated on SWE-bench Verified, the method achieves a 42.4% Pass@1 score—surpassing prior best results with small models by 25.4%—while invoking the expert only about four times per task on average, accounting for just 11% of total token usage.

Technology Category

Application Category

📝 Abstract
Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).
Problem

Research questions and friction points this paper is trying to address.

small language models
software engineering
action looping
resolution rate
SWE-bench
Innovation

Methods, ideas, or system contributions that make the work stand out.

small language models
expert collaboration
software engineering agents
reinforcement learning
SWE-bench
🔎 Similar Papers