Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

📅 2025-07-05

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing LLM unlearning methods require explicit access to and optimization on sensitive data, exacerbating privacy risks and violating the data minimization principle. Method: We propose Partial Model Collapse (PMC), a novel machine unlearning framework that leverages *partial distribution collapse*—an inherent phenomenon arising from iterative self-training of autoregressive generative models on their own outputs—to achieve target-free unlearning without accessing the data to be forgotten. Contribution/Results: PMC is the first approach to deliberately harness model collapse as a controllable, beneficial mechanism; we theoretically prove its convergence to an ideal unlearned state. By eliminating sensitive data from the objective function and integrating iterative feedback with rigorous theoretical analysis, PMC avoids direct exposure to private information. Experiments demonstrate that PMC significantly outperforms existing methods in both thoroughness of private information removal and privacy preservation, effectively resolving the long-standing trade-off between unlearning efficacy and privacy security.

Technology Category

Application Category

📝 Abstract

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their training objectives. We argue this not only risks reinforcing exposure to sensitive data, it also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method - Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from the model. Our core idea is to leverage this collapse for unlearning by triggering collapse partially on the sensitive data. We theoretically analyze that our approach converges to the desired outcome, i.e. the LLM unlearns the information in the forget set. We empirically demonstrate that PMC overcomes two key limitations of existing unlearning approaches that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs. Overall, our contributions represent an important step toward more comprehensive unlearning that aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

Problem

Research questions and friction points this paper is trying to address.

Addresses risks of reinforcing sensitive data exposure in LLM unlearning

Proposes Partial Model Collapse to remove private information without explicit targets

Ensures LLM unlearning aligns with real-world privacy constraints effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

PMC leverages partial model collapse for unlearning

Avoids using unlearning targets in objectives

Effectively removes private information from outputs

🔎 Similar Papers

No similar papers found.