stratum: A System Infrastructure for Massive Agent-Centric ML Workloads

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the inefficiency of the current Python machine learning ecosystem in supporting large-scale, highly concurrent machine learning pipeline searches driven by large language model (LLM) agents. To overcome this limitation, we propose the first system architecture specifically designed for agent-driven ML workloads, which decouples the agent’s planning and reasoning from pipeline execution to enable batch compilation and efficient scheduling. Our system introduces pipeline graph compilation, batched execution optimization, and a high-performance Rust runtime, while seamlessly integrating with mainstream Python libraries and supporting heterogeneous backends including CPUs and GPUs. Experimental results demonstrate that our approach achieves up to a 16.6× speedup on large-scale agent-driven ML pipeline search tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python's interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suited for LLM-based agents. This growing mismatch exposes a fundamental systems challenge in supporting agentic pipeline search at scale. We therefore propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning during agentic pipeline search. Stratum integrates seamlessly with existing Python libraries, compiles batches of pipelines into optimized execution graphs, and efficiently executes them across heterogeneous backends, including a novel Rust-based runtime. We present stratum's architectural vision along with an early prototype, discuss key design decisions, and outline open challenges and research directions. Finally, preliminary experiments show that stratum can significantly speed up large-scale agentic pipeline search up to 16.6x.

Problem

Research questions and friction points this paper is trying to address.

agentic pipeline search

large language models

ML system infrastructure

Python ecosystem

massive ML workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic pipeline search

system infrastructure

pipeline compilation