🤖 AI Summary
This work addresses the challenge of efficiently iterating and comparing architectures in large-scale recommender systems under continual learning, where full retraining is prohibitively expensive. To this end, we propose DIET, a novel framework that introduces dataset distillation into the continual learning setting for recommendation, constructing a dynamically evolving distilled memory. DIET employs a bilevel optimization strategy, integrating influence-based sample initialization with an influence-aware memory addressing mechanism to enable selective updates and efficient distillation. Remarkably, with only 1–2% of the original data retained, DIET preserves performance trends consistent with full-data retraining, reducing model iteration costs by up to 60× while maintaining cross-architecture generality and reusability.
📝 Abstract
Modern deep recommender models are trained under a continual learning paradigm, relying on massive and continuously growing streaming behavioral logs. In large-scale platforms, retraining models on full historical data for architecture comparison or iteration is prohibitively expensive, severely slowing down model development. This challenge calls for data-efficient approaches that can faithfully approximate full-data training behavior without repeatedly processing the entire evolving data stream. We formulate this problem as \emph{streaming dataset distillation for recommender systems} and propose \textbf{DIET}, a unified framework that maintains a compact distilled dataset which evolves alongside streaming data while preserving training-critical signals. Unlike existing dataset distillation methods that construct a static distilled set, DIET models distilled data as an evolving training memory and updates it in a stage-wise manner to remain aligned with long-term training dynamics. DIET enables effective continual distillation through principled initialization from influential samples and selective updates guided by influence-aware memory addressing within a bi-level optimization framework. Experiments on large-scale recommendation benchmarks demonstrate that DIET compresses training data to as little as \textbf{1-2\%} of the original size while preserving performance trends consistent with full-data training, reducing model iteration cost by up to \textbf{60$\times$}. Moreover, the distilled datasets produced by DIET generalize well across different model architectures, highlighting streaming dataset distillation as a scalable and reusable data foundation for recommender system development.