🤖 AI Summary
Existing membership inference attacks primarily target static models and fail to exploit the dynamic nature of models updated repeatedly over their lifecycle, thereby limiting the accuracy of privacy auditing. This work proposes SeMI*, the first theoretically optimal sequential membership inference attack tailored for dynamic model sequences under limited sample availability. By analyzing sequences of model updates, SeMI* precisely identifies whether a specific sample was inserted at a given time point. The method demonstrates that leveraging update sequences mitigates the dilution of membership signals caused by accumulating training data and enables joint optimization of sample insertion timing and decoy design to enhance auditing efficacy. Experiments across diverse data distributions and models trained or fine-tuned with DP-SGD show that practical variants of SeMI* significantly outperform existing baselines, yielding substantially tighter privacy auditing bounds.
📝 Abstract
Modern AI models are not static. They go through multiple updates in their lifecycles. Thus, exploiting the model dynamics to create stronger Membership Inference (MI) attacks and tighter privacy audits are timely questions. Though the literature empirically shows that using a sequence of model updates can increase the power of MI attacks, rigorous analysis of the `optimal' MI attacks is limited to static models with infinite samples. Hence, we develop an `optimal' MI attack, SeMI*, that uses the sequence of model updates to identify the presence of a target inserted at a certain update step. For the empirical mean computation, we derive the optimal power of SeMI*, while accessing a finite number of samples with or without privacy. Our results retrieve the existing asymptotic analysis. We observe that having access to the model sequence avoids the dilution of MI signals unlike the existing attacks on the final model, where the MI signal vanishes as training data accumulates. Furthermore, an adversary can use SeMI* to tune both the insertion time and the canary to yield tighter privacy audits. Finally, we conduct experiments across data distributions and models trained or fine-tuned with DP-SGD demonstrating that practical variants of SeMI* lead to tighter privacy audits than the baselines.