The Space Complexity of Learning-Unlearning Algorithms

📅 2025-06-16

📈 Citations: 1

✨ Influential: 1

career value

223K/year

🤖 AI Summary

This work investigates the storage complexity of strong data deletion (machine unlearning) in machine learning—specifically, the minimal memory required to support efficient, exact removal of any individual training sample. Focusing on realizability testing, we first demonstrate that the VC dimension fails to characterize the fundamental lower bound on forgetting space, and instead introduce the *eluder dimension* as a tight combinatorial lower bound, proving its necessity. Under the ticket-based memory model, we construct a matching upper bound based on the star number. Furthermore, we establish an inherent separation between centralized and ticket-based memory models, providing an explicit counterexample where the VC dimension is bounded yet the forgetting space requires Ω(n) bits. Collectively, these results identify the eluder dimension as the key combinatorial complexity measure governing the memory cost of strong unlearning.

Technology Category

Application Category

📝 Abstract

We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner never received the data of deleted users. In this paper, we ask how many bits of storage are needed to be able to delete certain training samples at a later time. We focus on the task of realizability testing, where the goal is to check whether the remaining training samples are realizable within a given hypothesis class (mathcal{H}). Toward that end, we first provide a negative result showing that the VC dimension is not a characterization of the space complexity of unlearning. In particular, we provide a hypothesis class with constant VC dimension (and Littlestone dimension), but for which any unlearning algorithm for realizability testing needs to store (Omega(n))-bits, where (n) denotes the size of the initial training dataset. In fact, we provide a stronger separation by showing that for any hypothesis class (mathcal{H}), the amount of information that the learner needs to store, so as to perform unlearning later, is lower bounded by the extit{eluder dimension} of (mathcal{H}), a combinatorial notion always larger than the VC dimension. We complement the lower bound with an upper bound in terms of the star number of the underlying hypothesis class, albeit in a stronger ticketed-memory model proposed by Ghazi et al. (2023). Since the star number for a hypothesis class is never larger than its Eluder dimension, our work highlights a fundamental separation between central and ticketed memory models for machine unlearning.

Problem

Research questions and friction points this paper is trying to address.

Memory complexity of machine unlearning algorithms

Space needed for deleting training samples later

Comparison between VC dimension and eluder dimension

Innovation

Methods, ideas, or system contributions that make the work stand out.

Eluder dimension bounds unlearning storage needs

Star number upper bounds ticketed-memory model

VC dimension insufficient for unlearning complexity

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique