🤖 AI Summary
Existing membership inference attack (MIA) evaluations against large vision-language models (LVLMs) suffer from severe bias: high reported success rates stem largely from distributional shifts in the detection set—not genuine membership discrimination ability. Method: We propose the first distribution-balanced, multi-stage annotated MIA benchmark for LVLMs, comprising 6,000 images with strictly controlled train/test distribution alignment and precise, stage-specific membership labels for pretraining, instruction tuning, and reinforcement learning. Contribution/Results: By eliminating data-induced bias, our benchmark enables fair, rigorous evaluation of state-of-the-art MIAs. Experiments reveal that under this unbiased setting, top-performing methods degrade to random guessing (~50% accuracy), exposing substantial overestimation of their practical efficacy. This work uncovers fundamental limitations of MIAs on LVLMs and establishes a new standard for trustworthy evaluation.
📝 Abstract
OpenLVLM-MIA is a new benchmark that highlights fundamental challenges in evaluating membership inference attacks (MIA) against large vision-language models (LVLMs). While prior work has reported high attack success rates, our analysis suggests that these results often arise from detecting distributional bias introduced during dataset construction rather than from identifying true membership status. To address this issue, we introduce a controlled benchmark of 6{,}000 images where the distributions of member and non-member samples are carefully balanced, and ground-truth membership labels are provided across three distinct training stages. Experiments using OpenLVLM-MIA demonstrated that the performance of state-of-the-art MIA methods converged to random chance under unbiased conditions. By offering a transparent and unbiased benchmark, OpenLVLM-MIA clarifies the current limitations of MIA research on LVLMs and provides a solid foundation for developing stronger privacy-preserving techniques.