Scalable Machine Learning Training Infrastructure for Online Ads Recommendation and Auction Scoring Modeling at Google

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in Google’s online advertising recommendation and bidding scoring—inefficient input generation, bottlenecks in large-scale embedding table processing, and resource waste due to fault-tolerant scheduling—this work proposes an end-to-end training infrastructure optimization. Methodologically, it introduces: (1) a novel shared input generation mechanism that unifies raw data-to-numerical feature conversion; (2) a co-designed framework integrating embedding table partitioning, pipelined execution, and RPC-based aggregation; and (3) a preemption-aware scheduler enabling training suspension and rapid resumption. The system incorporates TPU acceleration, sparse-to-dense feature conversion, distributed RPC optimization, and streaming data injection. Evaluated in production, the solution achieves a 116% increase in training throughput and an 18% reduction in unit cost, demonstrating significant scalability and efficiency gains for large-scale ad ranking models.

Technology Category

Application Category

📝 Abstract
Large-scale Ads recommendation and auction scoring models at Google scale demand immense computational resources. While specialized hardware like TPUs have improved linear algebra computations, bottlenecks persist in large-scale systems. This paper proposes solutions for three critical challenges that must be addressed for efficient end-to-end execution in a widely used production infrastructure: (1) Input Generation and Ingestion Pipeline: Efficiently transforming raw features (e.g.,"search query") into numerical inputs and streaming them to TPUs; (2) Large Embedding Tables: Optimizing conversion of sparse features into dense floating-point vectors for neural network consumption; (3) Interruptions and Error Handling: Minimizing resource wastage in large-scale shared datacenters. To tackle these challenges, we propose a shared input generation technique to reduce computational load of input generation by amortizing costs across many models. Furthermore, we propose partitioning, pipelining, and RPC (Remote Procedure Call) coalescing software techniques to optimize embedding operations. To maintain efficiency at scale, we describe novel preemption notice and training hold mechanisms that minimize resource wastage, and ensure prompt error resolution. These techniques have demonstrated significant improvement in Google production, achieving a 116% performance boost and an 18% reduction in training costs across representative models.
Problem

Research questions and friction points this paper is trying to address.

Data Transformation
Large-scale Tabular Data Optimization
Resource Efficiency in Learning Systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Training Platform
Efficient Data Transformation
Preemptive Error Management
🔎 Similar Papers
No similar papers found.
George Kurian
George Kurian
Nvidia
Somayeh Sardashti
Somayeh Sardashti
University of Wisconsin-Madison
Computer Sciences
R
Ryan Sims
Google LLC
F
Felix Berger
Google LLC
G
Gary Holt
Google LLC
Y
Yang Li
Google LLC
J
Jeremiah Willcock
Google LLC
Kaiyuan Wang
Kaiyuan Wang
Staff Software Engineer, Google
Machine LearningSoftware Engineering
H
Herve Quiroz
Google LLC
A
Abdulrahman Salem
Google LLC
J
Julian Grady
Google LLC