Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large vision-language models (LVLMs) are prone to memorizing training data, rendering them vulnerable to membership inference attacks (MIAs). Existing MIA methods predominantly rely on white-box or gray-box assumptions—requiring access to model internals such as gradients, parameters, or intermediate representations—making them inapplicable to mainstream LVLM services that expose only textual outputs. This paper proposes the first purely black-box MIA framework for LVLMs: it requires no model parameters, gradients, or internal feature representations, and instead infers membership solely from semantic anomalies in generated text. Its core innovation is a prior-knowledge calibration mechanism that effectively disentangles model memorization of private training data from general knowledge reasoning. Evaluated across four state-of-the-art LVLMs and three benchmark datasets, our method achieves performance on par with white-box and gray-box approaches while demonstrating strong robustness. Code and data are publicly released.

Technology Category

Application Category

📝 Abstract
Large vision-language models (LVLMs) derive their capabilities from extensive training on vast corpora of visual and textual data. Empowered by large-scale parameters, these models often exhibit strong memorization of their training data, rendering them susceptible to membership inference attacks (MIAs). Existing MIA methods for LVLMs typically operate under white- or gray-box assumptions, by extracting likelihood-based features for the suspected data samples based on the target LVLMs. However, mainstream LVLMs generally only expose generated outputs while concealing internal computational features during inference, limiting the applicability of these methods. In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. The core idea is to assess the model memorization of the private semantic information embedded within the suspected image data, which is unlikely to be inferred from general world knowledge alone. We conducted extensive experiments across four LVLMs and three datasets. Empirical results demonstrate that our method effectively identifies training data of LVLMs in a purely black-box setting and even achieves performance comparable to gray-box and white-box methods. Further analysis reveals the robustness of our method against potential adversarial manipulations, and the effectiveness of the methodology designs. Our code and data are available at https://github.com/spmede/KCMP.
Problem

Research questions and friction points this paper is trying to address.

Proposes black-box membership inference attack for large vision-language models
Assesses model memorization of private semantic information in images
Operates without accessing internal model parameters during inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box membership inference attack for LVLMs
Prior knowledge-calibrated memory probing mechanism
Assesses model memorization of private semantic information
🔎 Similar Papers
No similar papers found.
Jinhua Yin
Jinhua Yin
Tsinghua University
AI Security
P
Peiru Yang
Department of Electronic Engineering, Tsinghua University
C
Chen Yang
School of Computer Science, Beijing University of Posts and Telecommunications
H
Huili Wang
Department of Electronic Engineering, Tsinghua University
Z
Zhiyang Hu
College of Computer Science and Technology, Xinjiang University
Shangguang Wang
Shangguang Wang
Beijing University of Posts and Telecommunications
Service ComputingEdge ComputingSatellite Computing
Yongfeng Huang
Yongfeng Huang
Phd Student, Chinese University of Hong Kong
Natural Language Processing
Tao Qi
Tao Qi
Tsinghua University
AI SecurityResponsible AI