Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address privacy leakage risks of enterprise-sensitive data in multi-tenant LLM deployments, this paper proposes the Stained Glass Transform (SGT)—a learnable, stochastic, and sequence-dependent word embedding transformation. SGT is the first method to theoretically link embedding obfuscation with mutual information theory under Gaussian Mixture Models (GMM), enabling provably private embedding-level perturbation in an information-theoretic sense and supporting token-level posterior privacy quantification. It integrates mutual information modeling, stochastic embedding perturbation, and GMM-based theoretical analysis. Experiments on mainstream LLM benchmarks (e.g., MMLU, BBH) show that SGT incurs only marginal accuracy degradation (average drop <2.1%), while reducing original input reconstruction success rate by 87.3%, thus achieving a strong trade-off between rigorous privacy guarantees and practical inference utility.

Technology Category

Application Category

📝 Abstract

The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many teams in order to maximize the return on investment. In both scenarios the deployed models operate only on plaintext data, and so enterprise data owners must allow their data to appear in plaintext on a shared or multi-tenant compute infrastructure. This results in data owners with private or sensitive data being hesitant or restricted in what data they use with these types of deployments. In this work we introduce the Stained Glass Transform, a learned, stochastic, and sequence dependent transformation of the word embeddings of an LLM which information theoretically provides privacy to the input of the LLM while preserving the utility of model. We theoretically connect a particular class of Stained Glass Transforms to the theory of mutual information of Gaussian Mixture Models. We then calculate a-postiori privacy estimates, based on mutual information, and verify the privacy and utility of instances of transformed embeddings through token level metrics of privacy and standard LLM performance benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Protect sensitive data in shared LLM deployments

Balance privacy and utility in transformed embeddings

Theoretically link transforms to Gaussian Mixture Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned stochastic transformation for LLM embeddings

Privacy via mutual information theory

Preserves model utility while obfuscating data

🔎 Similar Papers

WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks