🤖 AI Summary
This work addresses the challenge enterprises face in deploying AI agents that balance capability, data sovereignty, and cost-effectiveness, a goal hindered by fragmented development pipelines in existing small models. To overcome this, we propose the first end-to-end platform that modularly integrates enterprise applications via the Model Context Protocol, establishing a closed-loop framework encompassing tool integration, programmatic environment encapsulation, trajectory synthesis, and continuous training and evaluation. This approach enables efficient generation of high-quality training data and supports automated model iteration. The resulting 8B-parameter model outperforms current small models by 10% on EnterpriseBench and CRMArena benchmarks, matching the performance of GPT-4o while reducing inference costs by 8–10×.
📝 Abstract
Deploying AI agents in enterprise environments requires balancing capability with data sovereignty and cost constraints. While small language models offer privacy-preserving alternatives to frontier models, their specialization is hindered by fragmented development pipelines that separate tool integration, data generation, and training. We introduce EnterpriseLab, a full-stack platform that unifies these stages into a closed-loop framework. EnterpriseLab provides (1) a modular environment exposing enterprise applications via Model Context Protocol, enabling seamless integration of proprietary and open-source tools; (2) automated trajectory synthesis that programmatically generates training data from environment schemas; and (3) integrated training pipelines with continuous evaluation. We validate the platform through EnterpriseArena, an instantiation with 15 applications and 140+ tools across IT, HR, sales, and engineering domains. Our results demonstrate that 8B-parameter models trained within EnterpriseLab match GPT-4o's performance on complex enterprise workflows while reducing inference costs by 8-10x, and remain robust across diverse enterprise benchmarks, including EnterpriseBench (+10%) and CRMArena (+10%). EnterpriseLab provides enterprises a practical path to deploying capable, privacy-preserving agents without compromising operational capability.