🤖 AI Summary
This work addresses the challenge of achieving efficient and deterministic machine unlearning in large language models, which is hindered by the entanglement of user data within shared model weights. To resolve this, the authors propose a three-tier architecture comprising a static base model, composable domain-expert LoRA adapters, and deletable user-specific agents, thereby decoupling user data from shared parameters. This design enables precise, deletion-based unlearning while preserving personalization capabilities. The approach inherently isolates user data, eliminates cross-user contamination, and provides natural robustness against model inversion and membership inference attacks. Evaluated on Phi-3.5-mini and Llama-3.1-8B, the method demonstrates that post-deletion model outputs closely match baseline distributions (KL divergence ≈ 0.21 nats, with 82–89% statistical pass rates), achieving a practical balance between personalized performance and strong privacy guarantees.
📝 Abstract
Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-layer architecture that decouples personal data from shared weights by combining a static base model, composable domain-expert LoRA adapters that shape behavior without imparting user data, and per-user proxy artefacts whose deletion constitutes deterministic unlearning. Evaluation on Phi-3.5-mini and Llama-3.1-8B confirms per-user differentiation in which personal data influences outputs while remaining isolated, verified by a return to baseline after proxy removal (KL divergence of approximately 0.21 nats, 82-89% verification pass rate) and near-zero cross-user contamination. Because user-specific information never enters shared weights, the architecture mitigates model inversion, membership inference, and training-data extraction against shared model components by construction. The approach converts machine unlearning from an intractable weight-editing problem into a deterministic deletion operation that preserves personalization alongside privacy-enhancing guarantees and is compatible with differentially private stochastic gradient descent (DP-SGD) for privacy-preserving shared model improvement.