COFFEE: COdesign Framework for Feature Enriched Embeddings in Ads-Ranking Systems

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Advertising recommendation systems often struggle to accurately capture user interests due to weak temporal relevance and limited information in user and ad representations. This work proposes a three-dimensional co-design framework that jointly enhances representation capacity along three axes: data source diversity, historical behavior sequence length, and feature richness. Specifically, it integrates multi-source events—particularly ad exposure data, which yields significant ROI gains—models ultra-long user behavior sequences, and incorporates event attributes with multimodal embeddings. The approach substantially improves ranking performance without increasing inference complexity: it achieves a relative 0.56% AUC gain in CTR prediction over the baseline, and in short-sequence settings (100–10K interactions), merely incorporating exposure data boosts AUC and the slope of the scaling curve by 1.56× to 2×.

Technology Category

Application Category

📝 Abstract
Diverse and enriched data sources are essential for commercial ads-recommendation models to accurately assess user interest both before and after engagement with content. While extended user-engagement histories can improve the prediction of user interests, it is equally important to embed activity sequences from multiple sources to ensure freshness of user and ad-representations, following scaling law principles. In this paper, we present a novel three-dimensional framework for enhancing user-ad representations without increasing model inference or serving complexity. The first dimension examines the impact of incorporating diverse event sources, the second considers the benefits of longer user histories, and the third focuses on enriching data with additional event attributes and multi-modal embeddings. We assess the return on investment (ROI) of our source enrichment framework by comparing organic user engagement sources, such as content viewing, with ad-impression sources. The proposed method can boost the area under curve (AUC) and the slope of scaling curves for ad-impression sources by 1.56 to 2 times compared to organic usage sources even for short online-sequence lengths of 100 to 10K. Additionally, click-through rate (CTR) prediction improves by 0.56% AUC over the baseline production ad-recommendation system when using enriched ad-impression event sources, leading to improved sequence scaling resolutions for longer and offline user-ad representations.
Problem

Research questions and friction points this paper is trying to address.

ads-ranking
feature enrichment
user representation
embedding
data source diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature-enriched embeddings
ads-ranking
multi-source user engagement
sequence modeling
scaling laws
🔎 Similar Papers
No similar papers found.