🤖 AI Summary
Current cognitive diagnosis (CD) models lack built-in privacy-preserving mechanisms, and generic machine unlearning algorithms fail to accommodate their heterogeneous parameter structures, resulting in insecure and inefficient student data removal. This paper presents the first systematic study of data unlearning for CD models, proposing Hierarchical Importance-guided Forgetting (HIF). HIF integrates instance-level and layer-level parameter importance estimation with a smooth update strategy to enable precise, controllable parameter revision. Evaluated on three real-world educational datasets, HIF significantly outperforms baseline methods, achieving an optimal trade-off among unlearning completeness (ΔAUC < 0.01), model utility (prediction accuracy degradation < 1.2%), and computational efficiency (speedup up to 2.3×). This work establishes the first verifiable, CD-specific unlearning framework for privacy-compliant data governance in educational AI.
📝 Abstract
The need to remove specific student data from cognitive diagnosis (CD) models has become a pressing requirement, driven by users'growing assertion of their"right to be forgotten". However, existing CD models are largely designed without privacy considerations and lack effective data unlearning mechanisms. Directly applying general purpose unlearning algorithms is suboptimal, as they struggle to balance unlearning completeness, model utility, and efficiency when confronted with the unique heterogeneous structure of CD models. To address this, our paper presents the first systematic study of the data unlearning problem for CD models, proposing a novel and efficient algorithm: hierarchical importanceguided forgetting (HIF). Our key insight is that parameter importance in CD models exhibits distinct layer wise characteristics. HIF leverages this via an innovative smoothing mechanism that combines individual and layer, level importance, enabling a more precise distinction of parameters associated with the data to be unlearned. Experiments on three real world datasets show that HIF significantly outperforms baselines on key metrics, offering the first effective solution for CD models to respond to user data removal requests and for deploying high-performance, privacy preserving AI systems