🤖 AI Summary
This work addresses the challenge in systems performance research that AI-driven algorithm discovery critically depends on a “reliable validator” assumption—often impractical in complex, real-world systems. To overcome this, we propose AI-Driven Systems Research (ADRS), an iterative evolutionary framework grounded in high-fidelity simulation environments, where large language models (LLMs) autonomously generate algorithms and refine them via performance feedback loops. Implemented on the open-source penEvolve platform, ADRS enables fully automated algorithm generation, evaluation, and optimization. We present the first systematic demonstration of AI’s capability to autonomously discover high-performance algorithms across diverse systems domains—including load balancing, Mixture-of-Experts (MoE) inference, LLM-SQL query processing, and transaction scheduling. Experimental results show that ADRS-generated algorithms outperform human-designed counterparts across multiple benchmarks, achieving up to 5.0× speedup or 50% cost reduction. This advances the research paradigm from manual algorithm design toward problem formulation and strategic guidance.
📝 Abstract
Artificial Intelligence (AI) is starting to transform the research process as we know it by automating the discovery of new solutions. Given a task, the typical AI-driven approach is (i) to generate a set of diverse solutions, and then (ii) to verify these solutions and select one that solves the problem. Crucially, this approach assumes the existence of a reliable verifier, i.e., one that can accurately determine whether a solution solves the given problem. We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery. This is because system performance problems naturally admit reliable verifiers: solutions are typically implemented in real systems or simulators, and verification reduces to running these software artifacts against predefined workloads and measuring performance. We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions. Using penEvolve, an existing open-source ADRS instance, we present case studies across diverse domains, including load balancing for multi-region cloud scheduling, Mixture-of-Experts inference, LLM-based SQL queries, and transaction scheduling. In multiple instances, ADRS discovers algorithms that outperform state-of-the-art human designs (e.g., achieving up to 5.0x runtime improvements or 50% cost reductions). We distill best practices for guiding algorithm evolution, from prompt design to evaluator construction, for existing frameworks. We then discuss the broader implications for the systems community: as AI assumes a central role in algorithm design, we argue that human researchers will increasingly focus on problem formulation and strategic guidance. Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.