DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder

πŸ“… 2026-01-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Reliable Docker environment construction remains a critical bottleneck in training and evaluating software engineering agents. This work formulates Docker build generation as a long-horizon agent task and introduces a cycle-detection controller alongside a cross-task success memory mechanism to produce transferable supervision signals through extended tool use, dependency reasoning, and failure recovery. Leveraging the SWE-Factory pipeline, we curate large-scale execution trajectories to train a 30B-A3B model, enhanced with closed-loop control and memory-augmented strategies. Our approach achieves a 39.72% Fail-to-Pass rate and a 58.28% Commit Rate on Multi-Docker-Eval, while significantly improving out-of-distribution generalization across multiple benchmarks, including SWE-bench Verified.

Technology Category

Application Category

πŸ“ Abstract
Reliable Docker-based environment construction is a dominant bottleneck for scaling execution-grounded training and evaluation of software engineering agents. We introduce DockSmith, a specialized agentic Docker builder designed to address this challenge. DockSmith treats environment construction not only as a preprocessing step, but as a core agentic capability that exercises long-horizon tool use, dependency reasoning, and failure recovery, yielding supervision that transfers beyond Docker building itself. DockSmith is trained on large-scale, execution-grounded Docker-building trajectories produced by a SWE-Factory-style pipeline augmented with a loop-detection controller and a cross-task success memory. Training a 30B-A3B model on these trajectories achieves open-source state-of-the-art performance on Multi-Docker-Eval, with 39.72% Fail-to-Pass and 58.28% Commit Rate. Moreover, DockSmith improves out-of-distribution performance on SWE-bench Verified, SWE-bench Multilingual, and Terminal-Bench 2.0, demonstrating broader agentic benefits of environment construction.
Problem

Research questions and friction points this paper is trying to address.

Docker
software engineering agents
environment construction
execution-grounded training
reliable coding environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Docker Builder
Environment Construction
Execution-Grounded Training
Failure Recovery
Cross-Task Memory
πŸ”Ž Similar Papers
No similar papers found.