Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses privacy risks arising from sharing data embeddings (activations) in collaborative learning—an underexplored setting compared to weight-sharing. Existing approaches lack formal privacy guarantees for embedding sharing and struggle to accommodate heterogeneous server-side models. We propose the first differentially private mechanism specifically designed for embedding sharing. Our method jointly designs a privacy-preserving encoder network and a lightweight utility generation network, enabling high-accuracy, low-overhead private embedding generation in a single communication round. Crucially, the mechanism is model-agnostic: it imposes no structural assumptions on the server-side model and seamlessly integrates with diverse downstream models—including deep neural networks, random forests, and XGBoost. Experiments demonstrate that, under strict (ε,δ)-differential privacy, our approach significantly reduces client-side computational overhead while maintaining performance close to non-private baselines across multiple tasks and model architectures.

Technology Category

Application Category

📝 Abstract

Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of embeddings (activations) created from the data. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of embeddings. We propose Ours to learn a privacy encoding network in conjunction with a small utility generation network such that the final embeddings generated from it are equipped with formal differential privacy guarantees. These privatized embeddings are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized embeddings that we share from the client are agnostic to the type of model (deep learning, random forests or XGBoost) used on the server in order to process these activations to complete a task.

Problem

Research questions and friction points this paper is trying to address.

Developing differentially private embedding sharing mechanisms

Enabling model-agnostic consumption of privatized tabular data

Reducing client computation and communication rounds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Private encoding network for embedding differential privacy

Client-server co-design reduces communication rounds

Model-agnostic privatized embeddings for flexible consumption

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models