ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion model image generation faces dual challenges: high computational overhead and leakage of sensitive user prompt information. To address these, we propose a privacy-first cloud-edge collaborative generation framework. On the cloud, a backbone denoising model processes a set of semantically equivalent but sensitive-attribute–anonymized candidate prompts; on the edge, a lightweight denoiser completes the remaining generation steps. Latent variable transmission and time/batch-level redundancy caching further accelerate inference. Crucially, the framework requires no trust in the cloud server and provides rigorous differential privacy guarantees while preserving semantic fidelity. Experiments across multiple benchmark datasets demonstrate that our method achieves generation quality comparable to full-cloud models, incurs less than 8% additional server overhead, and maintains controllable edge-side latency. To the best of our knowledge, this is the first approach to effectively balance practicality, efficiency, and strong privacy in diffusion-based image generation.

Technology Category

Application Category

📝 Abstract
Diffusion Models have gained significant popularity due to their remarkable capabilities in image generation, albeit at the cost of intensive computation requirement. Meanwhile, despite their widespread deployment in inference services such as Midjourney, concerns about the potential leakage of sensitive information in uploaded user prompts have arisen. Existing solutions either lack rigorous privacy guarantees or fail to strike an effective balance between utility and efficiency. To bridge this gap, we propose ObCLIP, a plug-and-play safeguard that enables oblivious cloud-device hybrid generation. By oblivious, each input prompt is transformed into a set of semantically similar candidate prompts that differ only in sensitive attributes (e.g., gender, ethnicity). The cloud server processes all candidate prompts without knowing which one is the real one, thus preventing any prompt leakage. To mitigate server cost, only a small portion of denoising steps is performed upon the large cloud model. The intermediate latents are then sent back to the client, which selects the targeted latent and completes the remaining denoising using a small device model. Additionally, we analyze and incorporate several cache-based accelerations that leverage temporal and batch redundancy, effectively reducing computation cost with minimal utility degradation. Extensive experiments across multiple datasets demonstrate that ObCLIP provides rigorous privacy and comparable utility to cloud models with slightly increased server cost.
Problem

Research questions and friction points this paper is trying to address.

Addressing privacy risks in cloud-based image generation from user prompts
Balancing privacy protection with computational efficiency in diffusion models
Preventing sensitive information leakage while maintaining image generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Oblivious prompt transformation for privacy preservation
Hybrid cloud-device denoising with partial server processing
Cache-based acceleration leveraging temporal and batch redundancy
🔎 Similar Papers
No similar papers found.
Haoqi Wu
Haoqi Wu
Tiktok
Secure Multi-party ComputationDistributed Machine LearningAI Privacy
W
Wei Dai
TikTok Inc.
M
Ming Xu
National University of Singapore
L
Li Wang
TikTok Inc.
Qiang Yan
Qiang Yan
Singapore Management University