Let the Barbarians In: How AI Can Accelerate Systems Performance Research

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the high manual exploration cost and expert-dependent validation in system performance optimization, this paper proposes the AI-Driven Research for Systems (ADRS) paradigm, establishing a closed-loop “generate–validate–optimize” framework. ADRS integrates large language model–based generation, simulation/real-system validation, evolutionary algorithms (e.g., OpenEvolve), workload-driven evaluation, and iterative refinement. It introduces the first reproducible and verifiable AI-augmented research methodology tailored for systems research, distilling key practical guidelines—including prompt engineering, feedback mechanisms, and robustness assessment. Evaluated across ten real-world scenarios—including cloud scheduling, MoE load balancing, and LLM-SQL optimization—ADRS-generated solutions consistently match or surpass human-designed state-of-the-art approaches, demonstrating its effectiveness, practicality, and strong cross-scenario generalizability.

Technology Category

Application Category

📝 Abstract

Artificial Intelligence (AI) is beginning to transform the research process by automating the discovery of new solutions. This shift depends on the availability of reliable verifiers, which AI-driven approaches require to validate candidate solutions. Research focused on improving systems performance is especially well-suited to this paradigm because system performance problems naturally admit such verifiers: candidates can be implemented in real systems or simulators and evaluated against predefined workloads. We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems (ADRS). Using several open-source ADRS instances (i.e., OpenEvolve, GEPA, and ShinkaEvolve), we demonstrate across ten case studies (e.g., multi-region cloud scheduling, mixture-of-experts load balancing, LLM-based SQL, transaction scheduling) that ADRS-generated solutions can match or even outperform human state-of-the-art designs. Based on these findings, we outline best practices (e.g., level of prompt specification, amount of feedback, robust evaluation) for effectively using ADRS, and we discuss future research directions and their implications. Although we do not yet have a universal recipe for applying ADRS across all of systems research, we hope our preliminary findings, together with the challenges we identify, offer meaningful guidance for future work as researcher effort shifts increasingly toward problem formulation and strategic oversight. Note: This paper is an extension of our prior work [14]. It adds extensive evaluation across multiple ADRS frameworks and provides deeper analysis and insights into best practices.

Problem

Research questions and friction points this paper is trying to address.

AI automates discovery of new solutions in systems performance research

AI-driven research requires reliable verifiers to validate candidate solutions

ADRS-generated solutions can match or outperform human state-of-the-art designs

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI automates discovery of new system solutions

Iterative cycle of generation, evaluation, and refinement

Solutions match or outperform human state-of-the-art designs

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

2024-07-31arXiv.orgCitations: 5