Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

📅 2024-12-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In collaborative learning, data owners, model developers, and code providers must jointly preserve the confidentiality of data, models, and training code while ensuring individual user privacy—yet existing solutions fail to simultaneously guarantee all three security properties. This paper proposes Citadel++, the first system to deeply integrate trusted execution environments (TEEs) with enhanced differential privacy, augmented by OS-level hardened sandboxing, remote attestation, and MPC-assisted protocols. Citadel++ achieves end-to-end confidentiality for all three assets—even under malicious model or training code—while provably protecting individual privacy. Evaluations demonstrate up to 543× and 113× training speedups on CPU- and GPU-based TEEs, respectively, with <1.2% accuracy degradation. It outperforms state-of-the-art privacy-preserving training systems across security, efficiency, and utility, meeting industrial-grade confidentiality and regulatory compliance requirements.

Technology Category

Application Category

📝 Abstract

A collaboration between dataset owners and model owners is needed to facilitate effective machine learning (ML) training. During this collaboration, however, dataset owners and model owners want to protect the confidentiality of their respective assets (i.e., datasets, models and training code), with the dataset owners also caring about the privacy of individual users whose data is in their datasets. Existing solutions either provide limited confidentiality for models and training code, or suffer from privacy issues due to collusion. We present Citadel++, a collaborative ML training system designed to simultaneously protect the confidentiality of datasets, models and training code as well as the privacy of individual users. Citadel++ enhances differential privacy mechanisms to safeguard the privacy of individual user data while maintaining model utility. By employing Virtual Machine-level Trusted Execution Environments (TEEs) as well as the improved sandboxing and integrity mechanisms through OS-level techniques, Citadel++ effectively preserves the confidentiality of datasets, models and training code, and enforces our privacy mechanisms even when the models and training code have been maliciously designed. Our experiments show that Citadel++ provides model utility and performance while adhering to the confidentiality and privacy requirements of dataset owners and model owners, outperforming the state-of-the-art privacy-preserving training systems by up to 543x on CPU and 113x on GPU TEEs.

Problem

Research questions and friction points this paper is trying to address.

Protect confidentiality of datasets, models, and training code

Safeguard privacy of individual user data in ML

Prevent collusion and malicious design in collaborative learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances differential privacy for user data protection

Uses Virtual Machine-level TEEs for confidentiality

Improves sandboxing and integrity via OS-level techniques

🔎 Similar Papers

Survey of Privacy Threats and Countermeasures in Federated Learning