🤖 AI Summary
This paper addresses the challenge of enabling efficient machine unlearning for Random Forest (RF) models under privacy regulations such as GDPR and CCPA. We propose the first incremental unlearning method that jointly optimizes accuracy and efficiency. Our approach introduces two key innovations: (1) an Occ(q) subsampling mechanism that explicitly bounds the influence scope of a target sample by controlling its occurrence probability across individual trees; and (2) a Lzy lazy-labeling strategy that defers structural node updates and triggers dynamic subtree reconstruction only when necessary. Crucially, our method avoids full model retraining and is compatible with diverse RF variants—including Extra-Trees—without architectural modification. On Extremely Randomized Trees, it achieves order-of-magnitude speedup in unlearning time while significantly outperforming existing baselines in unlearning accuracy, as validated on standard benchmarks.
📝 Abstract
Random Forests are widely recognized for establishing efficacy in classification and regression tasks, standing out in various domains such as medical diagnosis, finance, and personalized recommendations. These domains, however, are inherently sensitive to privacy concerns, as personal and confidential data are involved. With increasing demand for the right to be forgotten, particularly under regulations such as GDPR and CCPA, the ability to perform machine unlearning has become crucial for Random Forests. However, insufficient attention was paid to this topic, and existing approaches face difficulties in being applied to real-world scenarios. Addressing this gap, we propose the DynFrs framework designed to enable efficient machine unlearning in Random Forests while preserving predictive accuracy. Dynfrs leverages subsampling method Occ(q) and a lazy tag strategy Lzy, and is still adaptable to any Random Forest variant. In essence, Occ(q) ensures that each sample in the training set occurs only in a proportion of trees so that the impact of deleting samples is limited, and Lzy delays the reconstruction of a tree node until necessary, thereby avoiding unnecessary modifications on tree structures. In experiments, applying Dynfrs on Extremely Randomized Trees yields substantial improvements, achieving orders of magnitude faster unlearning performance and better predictive accuracy than existing machine unlearning methods for Random Forests.