🤖 AI Summary
Modeling extremely long user behavior sequences (mean 40K, peak 70K steps) in recommender systems incurs prohibitive computational overhead, excessive GPU resource consumption, and high data-center power usage.
Method: This paper proposes an offline embedding paradigm to replace end-to-end sequential modeling. It introduces (1) a novel multi-slice summarization learning mechanism that captures users’ long-term stable interests via multi-granularity sequence slicing and interest aggregation; (2) DV365—a lightweight, reusable, and highly incremental offline embedding—enabling ultra-long historical modeling without real-time sequence computation; and (3) a joint strategy combining foundation model distillation with incremental feature fusion.
Results: The approach has been deployed across 15 production recommendation models on Instagram and Threads, operating stably for over one year. It achieves significant improvements in recommendation quality while substantially reducing GPU resource utilization and power consumption.
📝 Abstract
Long user history is highly valuable signal for recommendation systems, but effectively incorporating it often comes with high cost in terms of data center power consumption and GPU. In this work, we chose offline embedding over end-to-end sequence length optimization methods to enable extremely long user sequence modeling as a cost-effective solution, and propose a new user embedding learning strategy, multi-slicing and summarization, that generates highly generalizable user representation of user's long-term stable interest. History length we encoded in this embedding is up to 70,000 and on average 40,000. This embedding, named as DV365, is proven highly incremental on top of advanced attentive user sequence models deployed in Instagram. Produced by a single upstream foundational model, it is launched in 15 different models across Instagram and Threads with significant impact, and has been production battle-proven for>1 year since our first launch.