🤖 AI Summary
This work exposes a critical data reconstruction privacy risk in segmentation inference (SI) scenarios involving vision foundation models: while prior attacks target small CNNs, high-fidelity reconstruction from deep intermediate representations (IRs) of foundation models remains unexplored. To address this gap, we introduce— for the first time—guided diffusion into SI privacy attacks, proposing a latent diffusion model (LDM)-based gradient-guided iterative reconstruction method that leverages LDM’s strong generative prior to efficiently recover original images from IRs. Our approach achieves substantial improvements over state-of-the-art methods across multiple benchmarks; quantitative metrics (e.g., LPIPS, FID) and qualitative evaluations consistently demonstrate superior reconstruction fidelity. This work constitutes the first systematic revelation of severe privacy vulnerabilities inherent in foundation models under SI settings. To foster reproducibility and further research, we publicly release our code.
📝 Abstract
With the rise of large foundation models, split inference (SI) has emerged as a popular computational paradigm for deploying models across lightweight edge devices and cloud servers, addressing data privacy and computational cost concerns. However, most existing data reconstruction attacks have focused on smaller CNN classification models, leaving the privacy risks of foundation models in SI settings largely unexplored. To address this gap, we propose a novel data reconstruction attack based on guided diffusion, which leverages the rich prior knowledge embedded in a latent diffusion model (LDM) pre-trained on a large-scale dataset. Our method performs iterative reconstruction on the LDM's learned image prior, effectively generating high-fidelity images resembling the original data from their intermediate representations (IR). Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, both qualitatively and quantitatively, in reconstructing data from deep-layer IRs of the vision foundation model. The results highlight the urgent need for more robust privacy protection mechanisms for large models in SI scenarios. Code is available at: https://github.com/ntuaislab/DRAG.