Semantic Leakage from Image Embeddings

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Although image embeddings do not directly reconstruct the original images, they may still leak semantic privacy. This work formally introduces the notion of “semantic leakage” and proposes SLImE, a framework that recovers semantic content solely from local semantic neighborhood structures while preserving embedding alignment—without requiring task-specific decoders. By integrating off-the-shelf embedding models with a lightweight local semantic retriever, SLImE leverages a neighborhood propagation mechanism to enable efficient inference. Experiments across diverse models—including GEMINI, COHERE, NOMIC, and CLIP—demonstrate consistent recovery of semantic labels, symbols, and coherent descriptions, revealing inherent privacy risks embedded within image representations.

Technology Category

Application Category

📝 Abstract

Image embeddings are generally assumed to pose limited privacy risk. We challenge this assumption by formalizing semantic leakage as the ability to recover semantic structures from compressed image embeddings. Surprisingly, we show that semantic leakage does not require exact reconstruction of the original image. Preserving local semantic neighborhoods under embedding alignment is sufficient to expose the intrinsic vulnerability of image embeddings. Crucially, this preserved neighborhood structure allows semantic information to propagate through a sequence of lossy mappings. Based on this conjecture, we propose Semantic Leakage from Image Embeddings (SLImE), a lightweight inference framework that reveals semantic information from standalone compressed image embeddings, incorporating a locally trained semantic retriever with off-the-shelf models, without training task-specific decoders. We thoroughly validate each step of the framework empirically, from aligned embeddings to retrieved tags, symbolic representations, and grammatical and coherent descriptions. We evaluate SLImE across a range of open and closed embedding models, including GEMINI, COHERE, NOMIC, and CLIP, and demonstrate consistent recovery of semantic information across diverse inference tasks. Our results reveal a fundamental vulnerability in image embeddings, whereby the preservation of semantic neighborhoods under alignment enables semantic leakage, highlighting challenges for privacy preservation.1

Problem

Research questions and friction points this paper is trying to address.

semantic leakage

image embeddings

privacy risk

semantic neighborhoods

embedding alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic leakage

image embeddings

embedding alignment