SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work proposes a lightweight, container-free training framework for reinforcement learning–based software engineering agents, addressing the high storage overhead, slow environment initialization, and complex permission management inherent in conventional task-level containerization. By leveraging kernel-level isolation, lightweight workspace management, and environment pre-caching techniques, the framework enables efficient and scalable agent training without relying on containers. Experimental results demonstrate that the approach reduces disk usage to approximately 5% of that required by standard container-based solutions and accelerates environment setup time to about 25% of the original duration, while maintaining comparable performance. These improvements substantially enhance resource efficiency and training scalability, offering a practical alternative for large-scale reinforcement learning applications in software engineering.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed by kernel-level mechanisms, substantially reducing system overhead. It leverages lightweight environment pre-caching techniques to eliminate the need for bulky container images. As a result, our approach lowers disk usage to approximately 5\% of that required by container-based pipelines and reduces environment preparation time to about 25\% of the container baseline. Empirical results demonstrate that SWE-MiniSandbox achieves evaluation performance comparable to standard container-based pipelines. By removing the dependency on heavy container infrastructure, SWE-MiniSandbox offers a practical and accessible foundation for scaling RL-based SWE agents, particularly in resource-constrained research environments.

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

software engineering agents

container overhead

environment isolation

scalable training

Innovation

Methods, ideas, or system contributions that make the work stand out.

container-free

reinforcement learning

software engineering agents