${sf FederBoost}$: Private Federated Learning for GBDT

๐Ÿ“… 2020-11-05
๐Ÿ›๏ธ IEEE Transactions on Dependable and Secure Computing
๐Ÿ“ˆ Citations: 59
โœจ Influential: 10
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Addressing the challenge of balancing privacy preservation and computational efficiency in GBDT-based federated learning under both vertical and horizontal partitioning settings, this paper proposes a lightweight, cryptography-free privacy-preserving framework. Methodologically, it leverages the key insight that GBDT relies solely on ordinal relationships among dataโ€”not raw feature valuesโ€”to design a secure vertical training protocol based on ordinal encoding. For horizontal partitioning, it integrates distributed split-point search with privacy-preserving histogram aggregation to enable efficient and secure model training. Evaluated on three public benchmark datasets, the proposed approach achieves accuracy comparable to centralized training while accelerating training speed by 4โ€“5 orders of magnitude over state-of-the-art methods. Moreover, it significantly reduces both communication overhead and computational cost, offering a practical solution for privacy-aware, large-scale federated GBDT deployment.
๐Ÿ“ Abstract
Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this article, we propose a framework named <inline-formula><tex-math notation="LaTeX">$mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq5-3276365.gif"/></alternatives></inline-formula> for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical <inline-formula><tex-math notation="LaTeX">$mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq6-3276365.gif"/></alternatives></inline-formula> does <italic>not</italic> require any cryptographic operation and horizontal <inline-formula><tex-math notation="LaTeX">$mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq7-3276365.gif"/></alternatives></inline-formula> only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the <italic>ordering</italic> of the data instead of the values. We fully implement <inline-formula><tex-math notation="LaTeX">$mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq8-3276365.gif"/></alternatives></inline-formula> and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal <inline-formula><tex-math notation="LaTeX">$mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq9-3276365.gif"/></alternatives></inline-formula> achieve the same level of accuracy with centralized training where all data are collected in a central server; and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.
Problem

Research questions and friction points this paper is trying to address.

Private federated learning for GBDT
Efficient training without heavy cryptography
Supports vertically and horizontally partitioned data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Private Federated Learning
Gradient Boosting Decision Trees
Lightweight Secure Aggregation
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhihua Tian
Zhihua Tian
Zhejiang Univrsity
Truthworth AI Generative Model Federated Learning
R
Rui Zhang
Zhejiang University, Hangzhou 310000, China
Xiaoyang Hou
Xiaoyang Hou
Zhejiang University
Lingjuan Lyu
Lingjuan Lyu
Sony
Foundation ModelsFederated LearningResponsible AI
T
Tianyi Zhang
Amazon
J
Jian Liu
Zhejiang University, Hangzhou 310000, China
K
K. Ren
Zhejiang University, Hangzhou 310000, China