🤖 AI Summary
This work addresses the challenges in evaluating machine unlearning methods—namely, technical complexity, high engineering overhead, and the absence of standardized benchmarks—by introducing the first standardized, low-barrier, and reproducible evaluation framework. Built around the marginal KL divergence (KLoM) metric, the framework integrates precomputed models, oracle outputs, and a modular evaluation pipeline. It enables out-of-the-box, fair comparisons across diverse unlearning algorithms while significantly reducing experimental costs. By supporting efficient and scalable assessments, this framework advances the establishment of standardized evaluation protocols and best practices in the field of machine unlearning.
📝 Abstract
Evaluating machine unlearning methods remains technically challenging, with recent benchmarks requiring complex setups and significant engineering overhead. We introduce a unified and extensible benchmarking suite that simplifies the evaluation of unlearning algorithms using the KLoM (KL divergence of Margins) metric. Our framework provides precomputed model ensembles, oracle outputs, and streamlined infrastructure for running evaluations out of the box. By standardizing setup and metrics, it enables reproducible, scalable, and fair comparison across unlearning methods. We aim for this benchmark to serve as a practical foundation for accelerating research and promoting best practices in machine unlearning. Our code and data are publicly available.