DIET: Learning to Distill Dataset Continually for Recommender Systems

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently iterating and comparing architectures in large-scale recommender systems under continual learning, where full retraining is prohibitively expensive. To this end, we propose DIET, a novel framework that introduces dataset distillation into the continual learning setting for recommendation, constructing a dynamically evolving distilled memory. DIET employs a bilevel optimization strategy, integrating influence-based sample initialization with an influence-aware memory addressing mechanism to enable selective updates and efficient distillation. Remarkably, with only 1–2% of the original data retained, DIET preserves performance trends consistent with full-data retraining, reducing model iteration costs by up to 60× while maintaining cross-architecture generality and reusability.

Technology Category

Application Category

📝 Abstract
Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model development. This challenge calls for data-efficient approaches that can faithfully approximate full-data training behavior without repeatedly processing the entire evolving data stream. We formulate this problem as \emph{streaming dataset distillation for recommender systems} and propose \textbf{DIET}, a unified framework that maintains a compact distilled dataset which evolves alongside streaming data while preserving training-critical signals. Unlike existing dataset distillation methods that construct a static distilled set, DIET models distilled data as an evolving training memory and updates it in a stage-wise manner to remain aligned with long-term training dynamics. DIET enables effective continual distillation through principled initialization from influential samples and selective updates guided by influence-aware memory addressing within a bi-level optimization framework. Experiments on large-scale recommendation benchmarks demonstrate that DIET compresses training data to as little as \textbf{1-2\%} of the original size while preserving performance trends consistent with full-data training, reducing model iteration cost by up to \textbf{60$\times$}. Moreover, the distilled datasets produced by DIET generalize well across different model architectures, highlighting streaming dataset distillation as a scalable and reusable data foundation for recommender system development.
Problem

Research questions and friction points this paper is trying to address.

continual learning
dataset distillation
recommender systems
data efficiency
streaming data
Innovation

Methods, ideas, or system contributions that make the work stand out.

streaming dataset distillation
continual learning
data-efficient recommendation
influence-aware memory
bi-level optimization