Reliable Imputed-Sample Assisted Vertical Federated Learning

๐Ÿ“… 2025-01-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In vertical federated learning (VFL), sparse sample intersections lead to underutilization of non-overlapping samples, while existing missing-data imputation methods neglect imputation quality. Method: This paper proposes a novel training paradigm leveraging high-fidelity imputed samples, featuring (i) the first reliability assessment mechanism for imputed samples grounded in Dempsterโ€“Shafer evidence theory, enabling adaptive selection of low-uncertainty samples; and (ii) a privacy-preserving VFL collaborative modeling framework integrating multiple imputation, allowing trustworthy cross-silo imputation without raw data sharing. Results: On CIFAR-10 with only 1% sample overlap, our method achieves a 48% accuracy gain over baseline VFL approaches, demonstrating superior generalization under ultra-sparse intersection settings and systematically unlocking the utility of non-overlapping samples.

Technology Category

Application Category

๐Ÿ“ Abstract
Vertical Federated Learning (VFL) is a well-known FL variant that enables multiple parties to collaboratively train a model without sharing their raw data. Existing VFL approaches focus on overlapping samples among different parties, while their performance is constrained by the limited number of these samples, leaving numerous non-overlapping samples unexplored. Some previous work has explored techniques for imputing missing values in samples, but often without adequate attention to the quality of the imputed samples. To address this issue, we propose a Reliable Imputed-Sample Assisted (RISA) VFL framework to effectively exploit non-overlapping samples by selecting reliable imputed samples for training VFL models. Specifically, after imputing non-overlapping samples, we introduce evidence theory to estimate the uncertainty of imputed samples, and only samples with low uncertainty are selected. In this way, high-quality non-overlapping samples are utilized to improve VFL model. Experiments on two widely used datasets demonstrate the significant performance gains achieved by the RISA, especially with the limited overlapping samples, e.g., a 48% accuracy gain on CIFAR-10 with only 1% overlapping samples.
Problem

Research questions and friction points this paper is trying to address.

Vertical Federated Learning
Non-shared Samples Utilization
Missing Data Imputation Quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

RISA Framework
Vertical Federated Learning (VFL)
Quality Imputation Samples
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yaopei Zeng
The Pennsylvania State University, State College, USA
L
Lei Liu
The Chinese University of Hong Kong, Shenzhen
Shaoguo Liu
Shaoguo Liu
Alibaba Corporation
Maching LearningComputer Vision
Hongjian Dou
Hongjian Dou
Alibaba
Recommender System
Baoyuan Wu
Baoyuan Wu
Associate Professor, CUHK-SZ
AI Security and PrivacyMachine LearningComputer VisionOptimization
L
Li Liu
Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China