ERASE -- A Real-World Aligned Benchmark for Unlearning in Recommender Systems

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing machine unlearning benchmarks fall short in reflecting real-world recommendation system scenarios due to limitations such as narrow task coverage, unrealistic deletion scales, and neglect of sequential forgetting dynamics and computational efficiency. To address these gaps, this work proposes ERASE—the first industrial-practice-oriented machine unlearning benchmark for recommender systems—encompassing collaborative filtering, session-based recommendation, and next-basket recommendation tasks, and simulating sequential removal of sensitive or spammy data. ERASE integrates seven unlearning algorithms (including both general-purpose and recommendation-specific methods), nine datasets, and nine mainstream models, releasing over 600GB of reusable experimental artifacts. Empirical results demonstrate that approximate unlearning can match retraining performance in certain settings; however, general-purpose methods exhibit instability under repeated unlearning on attention- and recurrent-based models, whereas recommendation-specific approaches prove more reliable, revealing both the strengths and current limitations of existing techniques.

Technology Category

Application Category

📝 Abstract

Machine unlearning (MU) enables the removal of selected training data from trained models, to address privacy compliance, security, and liability issues in recommender systems. Existing MU benchmarks poorly reflect real-world recommender settings: they focus primarily on collaborative filtering, assume unrealistically large deletion requests, and overlook practical constraints such as sequential unlearning and efficiency. We present ERASE, a large-scale benchmark for MU in recommender systems designed to align with real-world usage. ERASE spans three core tasks -- collaborative filtering, session-based recommendation, and next-basket recommendation -- and includes unlearning scenarios inspired by real-world applications, such as sequentially removing sensitive interactions or spam. The benchmark covers seven unlearning algorithms, including general-purpose and recommender-specific methods, across nine public datasets and nine state-of-the-art models. We execute ERASE to produce more than 600 GB of reusable artifacts, such as extensive experimental logs and more than a thousand model checkpoints. Crucially, the artifacts that we release enable systematic analysis of where current unlearning methods succeed and where they fall short. ERASE showcases that approximate unlearning can match retraining in some settings, but robustness varies widely across datasets and architectures. Repeated unlearning exposes weaknesses in general-purpose methods, especially for attention-based and recurrent models, while recommender-specific approaches behave more reliably. ERASE provides the empirical foundation to help the community assess, drive, and track progress toward practical MU in recommender systems.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

recommender systems

real-world benchmark

sequential unlearning

privacy compliance

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

recommender systems

real-world benchmark