ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers

πŸ“… 2025-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Manual performance optimization in hyperscale data centers is costly, error-prone, and unscalable. Method: This paper introduces the first end-to-end automated code optimization framework, integrating (i) a historical-commit-driven performance anti-pattern dictionary and (ii) a domain-finetuned large language model (LLM) to generate trustworthy refactoring proposals; optimization safety is ensured via automated validation and production-grade A/B testing. Contribution/Results: Deployed in Google’s production environment across >100 million lines of code, the framework achieves >99.5% optimization success rate, with 6,400+ validated optimization commits modifying 25,000 lines of code. It delivers an average quarterly saving of over 500,000 normalized CPU cores. This work establishes the first empirically validated, high-reliability, and scalable AI-driven performance optimization paradigm for hyperscale production systems.

Technology Category

Application Category

πŸ“ Abstract
With the end of Moore's Law, optimizing code for performance has become paramount for meeting ever-increasing compute demands, particularly in hyperscale data centers where even small efficiency gains translate to significant resource and energy savings. Traditionally, this process requires significant programmer effort to identify optimization opportunities, modify the code to implement the optimization, and carefully deploy and measure the optimization's impact. Despite a significant amount of work on automating program edits and promising results in small-scale settings, such performance optimizations have remained elusive in large real-world production environments, due to the scale, high degree of complexity, and reliability required. This paper introduces ECO (Efficient Code Optimizer), a system that automatically refactors source code to improve performance at scale. To achieve these performance gains, ECO searches through historical commits at scale to create a dictionary of performance anti-patterns that these commits addressed. These anti-patterns are used to search for similar patterns in a code base of billions of lines of code, pinpointing other code segments with similar potential optimization opportunities. Using a fine-tuned LLM, ECO then automatically refactors the code to generate and apply similar edits. Next, ECO verifies the transformed code, submits it for code review, and measures the impact of the optimization in production. Currently deployed on Google's hyperscale production fleet, this system has driven>25k changed lines of production code, across over 6.4k submitted commits, with a>99.5% production success rate. Over the past year, ECO has consistently resulted in significant performance savings every quarter. On average, the savings produced per quarter are equivalent to over 500k normalized CPU cores.
Problem

Research questions and friction points this paper is trying to address.

Automates code optimization for large-scale production environments.
Identifies and refactors performance anti-patterns using historical data.
Improves efficiency and reduces resource usage in hyperscale data centers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated code refactoring using LLM
Historical commit analysis for anti-patterns
Large-scale production optimization verification
πŸ”Ž Similar Papers
No similar papers found.
H
Hannah Lin
Google, Google DeepMind
M
Martin Maas
Google, Google DeepMind
M
Maximilian Roquemore
Google, Google DeepMind
Arman Hasanzadeh
Arman Hasanzadeh
Software Engineer, Google DeepMind
Graph Machine LearningRepresentation LearningBayesian InferenceGraph Neural NetworksGraph Signal Processing
F
Fred Lewis
Google, Google DeepMind
Y
Yusuf Simonson
Google, Google DeepMind
T
Tzu-Wei Yang
Google, Google DeepMind
Amir Yazdanbakhsh
Amir Yazdanbakhsh
Research Scientist at Google DeepMind
ML4HWML4CodeSparsityHW/SW Co-designComputer Architecture and Systems
D
Deniz Altinbuken
Google, Google DeepMind
M
Maggie Nolan Edmonds
Google, Google DeepMind
A
Aditya Patil
Google, Google DeepMind
C
Chris Kennelly
Google, Google DeepMind
Milad Hashemi
Milad Hashemi
Google
Computer ArchitectureMachine LearningSystems
Parthasarathy Ranganathan
Parthasarathy Ranganathan
Google
systemscomputer architecturedatacentersenergy efficiencypower management