SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This paper addresses the dual challenges of scarce task-specific annotated data and the inaccessibility of privacy-sensitive user data in personalized visual model training for smart devices. We propose a privacy-preserving, edge-cloud collaborative synthetic data generation framework. Leveraging only a small set of locally anonymized reference images—without uploading any raw private images—the server generates high-fidelity, task-customized synthetic data via three key components: differential privacy–driven feature distillation, object-mask-guided semantically controllable diffusion synthesis, and lightweight client-side preprocessing. We introduce the first fine-grained, object-level controllable synthesis mechanism, enabling users to dynamically trade off privacy (with ε ≤ 2) against utility. Evaluated on three specialized vision tasks, models trained on our synthetic data achieve 12.6%–24.3% higher accuracy than baseline methods, while eliminating the need for manual annotation.

Technology Category

Application Category

📝 Abstract

Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the performance of specialized models without compromising users privacy preferences.

Problem

Research questions and friction points this paper is trying to address.

Lack of labeled data for specialized ML tasks

Privacy concerns limit access to private labeled data

Need for customized synthetic data for private model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic image data for private ML training

Uses few sanitized reference images from users

Offers object-level control over privacy and utility

🔎 Similar Papers

Machine Learning for Synthetic Data Generation: a Review

2023-02-08arXiv.orgCitations: 122

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)