AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the dual challenges of vast design space and poor interpretability in distributed scheduling policy design for large-scale cloud environments. We propose an LLM-driven, interpretable AI design framework that leverages large language models to stochastically generate Python-based scheduling policies, which are then deterministically validated using a domain-specific simulator (Eudoxia). The framework establishes a “generate–feedback–optimize” closed loop to enable targeted, efficient exploration of the policy space. It integrates the function-as-a-service runtime Bauplan for rapid policy execution and facilitates automated simulation environment construction. Experiments demonstrate significant throughput improvements across multiple benchmarks, validating effectiveness. Our key contribution is the first deep integration of LLM-based policy sampling with high-fidelity simulator validation—ensuring policy interpretability while substantially enhancing automation and search efficiency in complex distributed system design.

Technology Category

Application Category

📝 Abstract

We explore AI-driven distributed-systems policy design by combining stochastic code generation from large language models (LLMs) with deterministic verification in a domain-specific simulator. Using a Function-as-a-Service runtime (Bauplan) and its open-source simulator (Eudoxia) as a case study, we frame scheduler design as an iterative generate-and-verify loop: an LLM proposes a Python policy, the simulator evaluates it on standardized traces, and structured feedback steers subsequent generations. This setup preserves interpretability while enabling targeted search over a large design space. We detail the system architecture and report preliminary results on throughput improvements across multiple models. Beyond early gains, we discuss the limits of the current setup and outline next steps; in particular, we conjecture that AI will be crucial for scaling this methodology by helping to bootstrap new simulators.

Problem

Research questions and friction points this paper is trying to address.

Optimizing cloud scheduler design through iterative LLM generation

Combining stochastic AI proposals with deterministic simulator verification

Scaling distributed systems via AI-generated policies and simulation feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate policies via stochastic code sampling

Simulator verifies policies with deterministic evaluation

Iterative feedback loop refines scheduler designs progressively

🔎 Similar Papers

No similar papers found.