TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Under the data-centric AI paradigm, sequential recommendation suffers from high training costs and difficulty in dataset distillation due to large-scale user–item–time interaction data. This paper proposes TD3—the first method to introduce Tucker tensor decomposition into sequential recommendation dataset distillation—modeling three-way interactions as disentangled latent factors and a relational kernel. TD3 employs a bi-level meta-optimization framework that jointly optimizes a surrogate objective and aligns feature distributions, surpassing conventional performance-matching paradigms. To stabilize the inner-loop optimization, it incorporates RaT-BPTT (Rank-aware Truncated Backpropagation Through Time) gradient clipping and synthetic sequence augmentation. Extensive experiments on multiple public benchmarks demonstrate that TD3 achieves up to 4.2× training speedup while maintaining or improving recommendation accuracy, and generalizes across diverse model architectures. The code is publicly available.

Technology Category

Application Category

📝 Abstract

In the era of data-centric AI, the focus of recommender systems has shifted from model-centric innovations to data-centric approaches. The success of modern AI models is built on large-scale datasets, but this also results in significant training costs. Dataset distillation has emerged as a key solution, condensing large datasets to accelerate model training while preserving model performance. However, condensing discrete and sequentially correlated user-item interactions, particularly with extensive item sets, presents considerable challenges. This paper introduces extbf{TD3}, a novel extbf{T}ucker extbf{D}ecomposition based extbf{D}ataset extbf{D}istillation method within a meta-learning framework, designed for sequential recommendation. TD3 distills a fully expressive emph{synthetic sequence summary} from original data. To efficiently reduce computational complexity and extract refined latent patterns, Tucker decomposition decouples the summary into four factors: emph{synthetic user latent factor}, emph{temporal dynamics latent factor}, emph{shared item latent factor}, and a emph{relation core} that models their interconnections. Additionally, a surrogate objective in bi-level optimization is proposed to align feature spaces extracted from models trained on both original data and synthetic sequence summary beyond the na""ive performance matching approach. In the emph{inner-loop}, an augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the emph{outer-loop}. To accelerate the optimization process and address long dependencies, RaT-BPTT is employed for bi-level optimization. Experiments and analyses on multiple public datasets have confirmed the superiority and cross-architecture generalizability of the proposed designs. Codes are released at https://github.com/USTC-StarTeam/TD3.

Problem

Research questions and friction points this paper is trying to address.

Efficient dataset distillation for sequential recommendation

Reducing computational complexity in training

Preserving model performance with synthetic summaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tucker decomposition for dataset distillation

Bi-level optimization with surrogate objective

RaT-BPTT for efficient optimization

🔎 Similar Papers

No similar papers found.