IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Vertical private-preserving machine learning (vPPML) faces three key challenges in secure multi-party data joining: cross-identifier (ID) leakage, reliance on a non-colluding auxiliary server under strong trust assumptions, and limited support to only two parties. Method: We propose the first practical malicious-majority-secure multi-party private set intersection (cmPSI) protocol and an efficient secure shuffling protocol for vPPML. Our approach integrates oblivious key-value stores (OKVS) with oblivious pseudorandom functions (OPRFs) to realize circuit-based multi-party PSI, and leverages secret sharing to achieve ID-private feature alignment and joint training dataset construction. Contribution/Results: Compared to iPrivJoin, cmPSI achieves 7.78× speedup and 8.73× communication reduction under malicious-majority security; our secure shuffling protocol attains 138.34× speedup and 132.13× communication reduction. For the first time, our solution eliminates the need for a trusted third party and scales to arbitrary numbers of parties, enabling fully decentralized, secure vPPML data joining.

Technology Category

Application Category

📝 Abstract

Vertical privacy-preserving machine learning (vPPML) enables multiple parties to train models on their vertically distributed datasets while keeping datasets private. In vPPML, it is critical to perform the secure dataset join, which aligns features corresponding to intersection IDs across datasets and forms a secret-shared and joint training dataset. However, existing methods for this step could be impractical due to: (1) they are insecure when they expose intersection IDs; or (2) they rely on a strong trust assumption requiring a non-colluding auxiliary server; or (3) they are limited to the two-party setting. This paper proposes IDCloak, the first practical secure multi-party dataset join framework for vPPML that keeps IDs private without a non-colluding auxiliary server. IDCloak consists of two protocols: (1) a circuit-based multi-party private set intersection protocol (cmPSI), which obtains secret-shared flags indicating intersection IDs via an optimized communication structure combining OKVS and OPRF; (2) a secure multi-party feature alignment protocol, which obtains the secret-shared and joint dataset using secret-shared flags, via our proposed efficient secure shuffle protocol. Experiments show that: (1) compared to the state-of-the-art secure two-party dataset join framework (iPrivjoin), IDCloak demonstrates higher efficiency in the two-party setting and comparable performance when the party number increases; (2) compared to the state-of-the-art cmPSI protocol under honest majority, our proposed cmPSI protocol provides a stronger security guarantee (dishonest majority) while improving efficiency by up to $7.78 imes$ in time and $8.73 imes$ in communication sizes; (3) our proposed secure shuffle protocol outperforms the state-of-the-art shuffle protocol by up to $138.34 imes$ in time and $132.13 imes$ in communication sizes.

Problem

Research questions and friction points this paper is trying to address.

Secure multi-party dataset join for vertical privacy-preserving ML

Private ID alignment without non-colluding auxiliary servers

Efficient feature alignment via optimized communication protocols

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-party private set intersection without auxiliary server

Efficient secure shuffle for feature alignment

Optimized communication combining OKVS and OPRF

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Privacy

OpenAI

$380K – $445K • Offers Equity

San Francisco

Research Scientist Intern, AI Alignment