Model Inversion Attacks on Vision-Language Models: Do They Leak What They Learn?

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study systematically investigates, for the first time, the vulnerability of vision-language models (VLMs) to training data privacy leakage via model inversion (MI) attacks—particularly challenging due to VLMs’ discrete textual outputs. Method: We propose two novel MI paradigms: token-level (TMI and TMI-C) and sequence-level (SMI and SMI-AW), integrating logit-maximization loss with adaptive token weighting and optimized vocabulary representation to enhance image reconstruction fidelity. Contribution/Results: Extensive experiments demonstrate that SMI-AW significantly outperforms existing baselines across multiple state-of-the-art VLMs. Human evaluation yields a 75.31% reconstruction accuracy, with high visual fidelity. Our work reveals previously underexplored privacy risks in VLMs and provides critical empirical evidence and methodological foundations for developing effective privacy-preserving mechanisms.

Technology Category

Application Category

📝 Abstract

Model inversion (MI) attacks pose significant privacy risks by reconstructing private training data from trained neural networks. While prior works have focused on conventional unimodal DNNs, the vulnerability of vision-language models (VLMs) remains underexplored. In this paper, we conduct the first study to understand VLMs' vulnerability in leaking private visual training data. To tailored for VLMs' token-based generative nature, we propose a suite of novel token-based and sequence-based model inversion strategies. Particularly, we propose Token-based Model Inversion (TMI), Convergent Token-based Model Inversion (TMI-C), Sequence-based Model Inversion (SMI), and Sequence-based Model Inversion with Adaptive Token Weighting (SMI-AW). Through extensive experiments and user study on three state-of-the-art VLMs and multiple datasets, we demonstrate, for the first time, that VLMs are susceptible to training data leakage. The experiments show that our proposed sequence-based methods, particularly SMI-AW combined with a logit-maximization loss based on vocabulary representation, can achieve competitive reconstruction and outperform token-based methods in attack accuracy and visual similarity. Importantly, human evaluation of the reconstructed images yields an attack accuracy of 75.31%, underscoring the severity of model inversion threats in VLMs. Notably we also demonstrate inversion attacks on the publicly released VLMs. Our study reveals the privacy vulnerability of VLMs as they become increasingly popular across many applications such as healthcare and finance.

Problem

Research questions and friction points this paper is trying to address.

Assess vision-language models' vulnerability to data leakage

Develop novel token and sequence-based model inversion attacks

Demonstrate privacy risks in popular vision-language applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-based Model Inversion for VLMs

Sequence-based Model Inversion strategies

Logit-maximization loss for reconstruction

🔎 Similar Papers

Backdooring Vision-Language Models with Out-Of-Distribution Data