ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Manual performance optimization in hyperscale data centers is costly, error-prone, and unscalable. Method: This paper introduces the first end-to-end automated code optimization framework, integrating (i) a historical-commit-driven performance anti-pattern dictionary and (ii) a domain-finetuned large language model (LLM) to generate trustworthy refactoring proposals; optimization safety is ensured via automated validation and production-grade A/B testing. Contribution/Results: Deployed in Google’s production environment across >100 million lines of code, the framework achieves >99.5% optimization success rate, with 6,400+ validated optimization commits modifying 25,000 lines of code. It delivers an average quarterly saving of over 500,000 normalized CPU cores. This work establishes the first empirically validated, high-reliability, and scalable AI-driven performance optimization paradigm for hyperscale production systems.

Technology Category

Application Category

📝 Abstract

With the end of Moore's Law, optimizing code for performance has become paramount for meeting ever-increasing compute demands, particularly in hyperscale data centers where even small efficiency gains translate to significant resource and energy savings. Traditionally, this process requires significant programmer effort to identify optimization opportunities, modify the code to implement the optimization, and carefully deploy and measure the optimization's impact. Despite a significant amount of work on automating program edits and promising results in small-scale settings, such performance optimizations have remained elusive in large real-world production environments, due to the scale, high degree of complexity, and reliability required. This paper introduces ECO (Efficient Code Optimizer), a system that automatically refactors source code to improve performance at scale. To achieve these performance gains, ECO searches through historical commits at scale to create a dictionary of performance anti-patterns that these commits addressed. These anti-patterns are used to search for similar patterns in a code base of billions of lines of code, pinpointing other code segments with similar potential optimization opportunities. Using a fine-tuned LLM, ECO then automatically refactors the code to generate and apply similar edits. Next, ECO verifies the transformed code, submits it for code review, and measures the impact of the optimization in production. Currently deployed on Google's hyperscale production fleet, this system has driven>25k changed lines of production code, across over 6.4k submitted commits, with a>99.5% production success rate. Over the past year, ECO has consistently resulted in significant performance savings every quarter. On average, the savings produced per quarter are equivalent to over 500k normalized CPU cores.

Problem

Research questions and friction points this paper is trying to address.

Automates code optimization for large-scale production environments.

Identifies and refactors performance anti-patterns using historical data.

Improves efficiency and reduces resource usage in hyperscale data centers.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated code refactoring using LLM

Historical commit analysis for anti-patterns

Large-scale production optimization verification

🔎 Similar Papers

Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models