Uncovering the Limitations of Model Inversion Evaluation: Benchmarks and Connection to Type-I Adversarial Attacks

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work exposes a pervasive high false-positive problem in model inversion (MI) attack evaluation, leading to severe overestimation of privacy leakage by state-of-the-art methods. To address this, the authors construct the first large-scale, human-annotated MI benchmark dataset and systematically establish, for the first time, the intrinsic connection between MI evaluation and Type-I adversarial attacks. Through controlled-variable experiments, adversarial feature analysis, and comprehensive benchmarking across diverse attacks, defenses, and data settings, they empirically demonstrate that mainstream automated evaluation metrics suffer from significantly inflated false-positive rates. Consequently, they propose a new evaluation paradigm grounded in human judgment as the ground truth and provide actionable strategies to mitigate false positives. Key contributions include: (1) the first high-quality, human-annotated MI benchmark; (2) formal identification of the adversarial nature of MI evaluation; and (3) a paradigm shift from automated metrics toward human-centered assessment standards.

Technology Category

Application Category

📝 Abstract

Model Inversion (MI) attacks aim to reconstruct information of private training data by exploiting access to machine learning models. The most common evaluation framework for MI attacks/defenses relies on an evaluation model that has been utilized to assess progress across almost all MI attacks and defenses proposed in recent years. In this paper, for the first time, we present an in-depth study of MI evaluation. Firstly, we construct the first comprehensive human-annotated dataset of MI attack samples, based on 28 setups of different MI attacks, defenses, private and public datasets. Secondly, using our dataset, we examine the accuracy of the MI evaluation framework and reveal that it suffers from a significant number of false positives. These findings raise questions about the previously reported success rates of SOTA MI attacks. Thirdly, we analyze the causes of these false positives, design controlled experiments, and discover the surprising effect of Type I adversarial features on MI evaluation, as well as adversarial transferability, highlighting a relationship between two previously distinct research areas. Our findings suggest that the performance of SOTA MI attacks has been overestimated, with the actual privacy leakage being significantly less than previously reported. In conclusion, we highlight critical limitations in the widely used MI evaluation framework and present our methods to mitigate false positive rates. We remark that prior research has shown that Type I adversarial attacks are very challenging, with no existing solution. Therefore, we urge to consider human evaluation as a primary MI evaluation framework rather than merely a supplement as in previous MI research. We also encourage further work on developing more robust and reliable automatic evaluation frameworks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating limitations in Model Inversion attack assessment methods

Identifying false positives in current MI evaluation frameworks

Exploring link between Type-I adversarial attacks and MI evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed first human-annotated MI attack dataset

Revealed evaluation framework's high false positives

Linked Type-I adversarial features to MI evaluation

🔎 Similar Papers

MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense