Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
Existing recommender systems struggle to fully exploit sample-level information, and the structural heterogeneity between sequential and non-sequential features limits model expressiveness. To address this, this work proposes a “sample-as-feature” paradigm that directly encodes raw samples into sample-level tokens within sequences. A hierarchical group-adaptive quantization (HGAQ) mechanism is introduced to construct a sample tokenizer, and a novel SIF-Mixer module is designed to jointly model token-level and sample-level interactions. This approach enables full-sample contextual modeling and deep homogeneous feature interaction while preserving time-varying sample-level attributes—a capability not previously achieved. Evaluated on Meituan Waimai’s large-scale dataset, the method significantly outperforms existing approaches in recommendation performance and has been successfully deployed in production.

Technology Category

Application Category

📝 Abstract
Scaling industrial recommender models has followed two parallel paradigms: \textbf{sample information scaling} -- enriching the information content of each training sample through deeper and longer behavior sequences -- and \textbf{model capacity scaling} -- unifying sequence modeling and feature interaction within a single Transformer backbone. However, these two paradigms still face two structural limitations. Firstly, sample information scaling methods encode only a subset of each historical interaction into the sequence token, leaving the majority of the original sample context unexploited and precluding the modeling of sample-level, time-varying features. Secondly, model capacity scaling methods are inherently constrained by the structural heterogeneity between sequential and non-sequential features, preventing the model from fully realizing its representational capacity. To address these issues, we propose \textbf{SIF} (\emph{Sample Is Feature}), which encodes each historical Raw Sample directly into the sequence token -- maximally preserving sample information while simultaneously resolving the heterogeneity between sequential and non-sequential features. SIF consists of two key components. The \textbf{Sample Tokenizer} quantizes each historical Raw Sample into a Token Sample via hierarchical group-adaptive quantization (HGAQ), enabling full sample-level context to be incorporated into the sequence efficiently. The \textbf{SIF-Mixer} then performs deep feature interaction over the homogeneous sample representations via token-level and sample-level mixing, fully unleashing the model's representational capacity. Extensive experiments on a large-scale industrial dataset validate SIF's effectiveness, and we have successfully deployed SIF on the Meituan food delivery platform.
Problem

Research questions and friction points this paper is trying to address.

sample-level tokens
feature heterogeneity
sequence modeling
recommendation systems
model scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample Is Feature
Sample Tokenizer
SIF-Mixer
hierarchical group-adaptive quantization
unified recommender model