🤖 AI Summary
Under the data-centric AI paradigm, sequential recommendation suffers from high training costs and difficulty in dataset distillation due to large-scale user–item–time interaction data. This paper proposes TD3—the first method to introduce Tucker tensor decomposition into sequential recommendation dataset distillation—modeling three-way interactions as disentangled latent factors and a relational kernel. TD3 employs a bi-level meta-optimization framework that jointly optimizes a surrogate objective and aligns feature distributions, surpassing conventional performance-matching paradigms. To stabilize the inner-loop optimization, it incorporates RaT-BPTT (Rank-aware Truncated Backpropagation Through Time) gradient clipping and synthetic sequence augmentation. Extensive experiments on multiple public benchmarks demonstrate that TD3 achieves up to 4.2× training speedup while maintaining or improving recommendation accuracy, and generalizes across diverse model architectures. The code is publicly available.
📝 Abstract
In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. However, condensing discrete and sequentially correlated user-item interactions, particularly with extensive item sets, presents considerable challenges. This paper introduces extbf{TD3}, a novel extbf{T}ucker extbf{D}ecomposition based extbf{D}ataset extbf{D}istillation method within a meta-learning framework, designed for sequential recommendation. TD3 distills a fully expressive emph{synthetic sequence summary} from original data. To efficiently reduce computational complexity and extract refined latent patterns, Tucker decomposition decouples the summary into four factors: emph{synthetic user latent factor}, emph{temporal dynamics latent factor}, emph{shared item latent factor}, and a emph{relation core} that models their interconnections. Additionally, a surrogate objective in bi-level optimization is proposed to align feature spaces extracted from models trained on both original data and synthetic sequence summary beyond the na""ive performance matching approach. In the emph{inner-loop}, an augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the emph{outer-loop}. To accelerate the optimization process and address long dependencies, RaT-BPTT is employed for bi-level optimization. Experiments and analyses on multiple public datasets have confirmed the superiority and cross-architecture generalizability of the proposed designs. Codes are released at https://github.com/USTC-StarTeam/TD3.