🤖 AI Summary
This work addresses the challenges in industrial recommendation systems where multi-scenario data are often misaligned and existing cross-domain methods suffer from high computational costs and poor scalability. To overcome these limitations, we propose MTFM, a Transformer-based foundational model for industrial recommendation that introduces a novel paradigm for multi-scenario modeling without requiring input alignment. MTFM unifies heterogeneous cross-domain data through a shared token representation and enhances efficiency via user-level sample aggregation and tailored attention mechanisms, including Grouped-Query Attention and Hybrid Target Attention. Experimental results demonstrate that MTFM consistently improves recommendation performance as model size and scenario data scale up, while achieving substantially higher training and inference throughput compared to existing approaches.
📝 Abstract
Industrial recommendation systems typically involve multiple scenarios, yet existing cross-domain (CDR) and multi-scenario (MSR) methods often require prohibitive resources and strict input alignment, limiting their extensibility. We propose MTFM (Meituan Foundation Model for Recommendation), a transformer-based framework that addresses these challenges. Instead of pre-aligning inputs, MTFM transforms cross-domain data into heterogeneous tokens, capturing multi-scenario knowledge in an alignment-free manner. To enhance efficiency, we first introduce a multi-scenario user-level sample aggregation that significantly enhances training throughput by reducing the total number of instances. We further integrate Grouped-Query Attention and a customized Hybrid Target Attention to minimize memory usage and computational complexity. Furthermore, we implement various system-level optimizations, such as kernel fusion and the elimination of CPU-GPU blocking, to further enhance both training and inference throughput. Offline and online experiments validate the effectiveness of MTFM, demonstrating that significant performance gains are achieved by scaling both model capacity and multi-scenario training data.