Memento: Personalized RAG-Style Long-Retention Data Scaling for META Ads Recommendation

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of long-history modeling in advertising recommendation—namely attention dilution, computational inefficiency, and catastrophic forgetting—where conventional truncation strategies like LastN prove inadequate. Treating user historical interactions as a document corpus and ad requests as queries, the authors propose a personalized retrieval-augmented framework based on Maximal Marginal Relevance (MMR) to recall both relevant and diverse long-term behaviors. They introduce a novel dual-path mechanism comprising Representation Memento and Data Memento, co-optimized with infrastructure enhancements to efficiently leverage over 365 days of user history for the first time in ad recommendation. The deployed system integrates temporal chunking, INT8 quantization, and asynchronous serving, achieving sub-10ms online latency while improving click-through rate by 1%, conversion rate by 1.2%, and normalized entropy by 0.25–0.3%, with 5–10× better resource efficiency than linear scaling.
📝 Abstract
Modeling of long history data suffers from long-context window attention dilution, system efficiency and catastrophic forgetting problems, where naive linear scaling approach like LastN would fail. We introduce Memento, a personalized retrieval-augmented framework that treats historical user engagements as a document corpus and ad requests as queries, retrieving relevant interactions via Maximal Marginal Relevance (MMR) to balance similarity with diversity. We identify two complementary applications: Representation Memento, which retrieves historical embeddings for feature augmentation, and Data Memento, which retrieves past training examples for multipass training. Through infrastructure co-design -- temporal chunking, INT8 quantization, and asynchronous serving -- Memento achieves 5-10$\times$ resource efficiency over linear scaling. Memento processes daily requests with sub-10ms latency, yielding 0.25-0.3% Normalized Entropy gain on both click-through and conversion prediction. In production, Memento delivers a 1% CTR lift on Facebook Feed and Reels and a 1.2% CVR lift, scaling personalization to 365+ days of history.
Problem

Research questions and friction points this paper is trying to address.

long-history modeling
catastrophic forgetting
attention dilution
personalized recommendation
data scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)
Maximal Marginal Relevance (MMR)
Long-Term User History Modeling
Infrastructure Co-Design
Personalized Recommendation