External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Deploying trillion-parameter foundation models for online advertising recommendation faces critical challenges in meeting strict low-latency inference requirements and adapting to dynamic data distributions in industrial settings. Method: This paper proposes ExFM, an external distillation framework featuring: (1) a teacher foundation model reuse mechanism enabling cross-distribution knowledge transfer; (2) co-designed auxiliary heads and student adapters for computational cost sharing; and (3) a collaborative architecture integrating a Data Augmentation System (DAS) with streaming self-adaptive modeling across base and vertical models. Results: Evaluated on both industrial and public benchmarks, ExFM significantly improves recommendation accuracy while maintaining controllable inference latency and reducing training overhead by over 40%. It establishes a novel paradigm for efficient deployment of ultra-large-scale models in real-time recommendation systems.

Technology Category

Application Category

📝 Abstract

Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.

Problem

Research questions and friction points this paper is trying to address.

Efficiently serve trillion-parameter models

Control training and inference budgets

Mitigate data distribution shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

External distillation reduces computational cost

Data augmentation system maintains high performance

Auxiliary Head mitigates data distribution gap

🔎 Similar Papers

GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation