🤖 AI Summary
This work addresses the challenge of deploying in-house code generation agents, which often fail to transition from prototype to production due to misalignment between model capabilities and real-world engineering requirements. The authors propose CodeGen, an internal coding agent developed at Zup, that systematically enhances reliability and team adoption through string-replacement-based editing, multi-layered safety guards, explicit state management, and progressive human oversight. Their findings demonstrate that engineering design choices—such as editing strategies, safety mechanisms, and trust calibration—play a more decisive role in production effectiveness than the underlying model alone, thereby bridging a critical gap between technical prototypes and practical deployment of code generation agents.
📝 Abstract
Enterprise teams building internal coding agents face a gap between prototype performance and production readiness. The root cause is that technical model quality alone is insufficient -- tool design, safety enforcement, state management, and human trust calibration are equally decisive, yet underreported in the literature. We present CodeGen, an internal coding agent at Zup, and show that targeted tool design (e.g., string-replacement edits over full-file rewrites) and layered safety guardrails improved agent reliability more than prompt engineering, while progressive human oversight modes drove organic adoption without mandating trust. These findings suggest that the engineering decisions surrounding the model -- not the model itself -- determine whether a coding agent delivers real value in practice.