MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently leveraging large-model parameters in industrial recommendation systems under multi-scenario, multi-task settings, where existing approaches lack a unified framework for jointly modeling scenarios and tasks. The authors propose the MDL framework, which, for the first time, adapts the prompting paradigm from large language models to recommender systems by encoding scenario and task information into dedicated tokens. A novel three-level attention mechanism—comprising feature self-attention, domain-feature co-attention, and domain-fusion aggregation—enables deep integration of features, scenarios, and tasks with parameter-adaptive activation. The method supports joint prediction across multiple distributions and, upon deployment on TikTok’s search platform, achieved a 0.0626% increase in LT30 and a 0.3267% reduction in query-switching rate, significantly outperforming state-of-the-art methods and now serving hundreds of millions of users at full scale.

Technology Category

Application Category

📝 Abstract
Industrial recommender systems increasingly adopt multi-scenario learning (MSL) and multi-task learning (MTL) to handle diverse user interactions and contexts, but existing approaches suffer from two critical drawbacks: (1) underutilization of large-scale model parameters due to limited interaction with complex feature modules, and (2) difficulty in jointly modeling scenario and task information in a unified framework. To address these challenges, we propose a unified \textbf{M}ulti-\textbf{D}istribution \textbf{L}earning (MDL) framework, inspired by the"prompting"paradigm in large language models (LLMs). MDL treats scenario and task information as specialized tokens rather than auxiliary inputs or gating signals. Specifically, we introduce a unified information tokenization module that transforms features, scenarios, and tasks into a unified tokenized format. To facilitate deep interaction, we design three synergistic mechanisms: (1) feature token self-attention for rich feature interactions, (2) domain-feature attention for scenario/task-adaptive feature activation, and (3) domain-fused aggregation for joint distribution prediction. By stacking these interactions, MDL enables scenario and task information to"prompt"and activate the model's vast parameter space in a bottom-up, layer-wise manner. Extensive experiments on real-world industrial datasets demonstrate that MDL significantly outperforms state-of-the-art MSL and MTL baselines. Online A/B testing on Douyin Search platform over one month yields +0.0626\% improvement in LT30 and -0.3267\% reduction in change query rate. MDL has been fully deployed in production, serving hundreds of millions of users daily.
Problem

Research questions and friction points this paper is trying to address.

multi-scenario learning
multi-task learning
large-scale recommendation
parameter underutilization
unified modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Distribution Learning
Tokenization
Prompting
Multi-Scenario Learning
Multi-Task Learning
🔎 Similar Papers
No similar papers found.
Shanlei Mu
Shanlei Mu
Bytedance
Recommender SystemInformation RetrievalComputational Advertising
Yuchen Jiang
Yuchen Jiang
Alibaba Group
S
Shikang Wu
ByteDance Search
S
Shiyong Hong
ByteDance Search
T
Tianmu Sha
ByteDance Search
J
Junjie Zhang
ByteDance Search
J
Jie Zhu
ByteDance AML
Z
Zhe Chen
ByteDance AML
Z
Zhe Wang
ByteDance Search
J
Jingjian Lin
ByteDance Search