Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high communication overhead and inefficiency of nonlinear operations (e.g., division, sigmoid) in Gradient Boosted Decision Tree (GBDT) training under Multi-Party Computation (MPC)–based vertical federated learning, this paper proposes an efficient privacy-preserving GBDT training framework. Methodologically, it introduces: (1) a lightweight polynomial approximation scheme to replace MPC-unfriendly nonlinear functions, and (2) a lossy message compression mechanism during gradient aggregation to substantially reduce communication load. Experiments demonstrate that, on both LAN and WAN settings, the framework achieves 2.71×/2.7× and 12.21×/8.2× speedups over HEP-XGB and SiGBDT, respectively, while maintaining model accuracy comparable to plaintext XGBoost—within ±1%–±2% error margin. This work marks the first practical vertical federated GBDT training achieving strong privacy guarantees without compromising utility.

Technology Category

Application Category

📝 Abstract
In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world datasets. The results show that Guard-GBDT outperforms state-of-the-art HEP-XGB (CIKM'21) and SiGBDT (ASIA CCS'24) by up to $2.71 imes$ and $12.21 imes$ on LAN network and up to $2.7 imes$ and $8.2 imes$ on WAN network. Guard-GBDT also achieves comparable accuracy with SiGBDT and plaintext XGBoost (better than HEP-XGB ), which exhibits a deviation of $pm1%$ to $pm2%$ only. Our implementation code is provided at https://github.com/XidianNSS/Guard-GBDT.git.
Problem

Research questions and friction points this paper is trying to address.

Efficient privacy-preserving GBDT training on vertical datasets
Reducing communication costs in MPC-based GBDT frameworks
Avoiding MPC-unfriendly operations like division and sigmoid
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses streamlined approximations for MPC-unfriendly operations
Reduces communication via compressed gradient aggregation messages
Achieves high efficiency and privacy in GBDT training
🔎 Similar Papers
No similar papers found.