Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing vertical federated learning (VFL) methods are constrained by strict assumptions on participant count, data alignment quality, and label availability, limiting their applicability to real-world scenarios featuring diverse missing-data patterns and heterogeneous alignment structures. To address this, we propose the first unified VFL framework supporting arbitrary data alignment configurations, arbitrary label missingness mechanisms, and multi-party collaboration. Our approach formalizes alignment gaps as missing-data problems and leverages deep latent-variable models—specifically VAEs and VAEGANs—to jointly perform generative modeling, probabilistic inference, and privacy-preserving federated optimization. The framework accommodates unlabeled samples, dynamically adapts to varying alignment degrees, and enables multi-stage collaborative training. Evaluated across 168 configurations, it outperforms all baselines in 160 cases, achieving an average accuracy gain of 9.6 percentage points. Empirical results on healthcare and financial heterogeneous benchmarks demonstrate substantial improvements in practical utility and generalization capability.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) has attracted significant attention for enabling collaborative learning without exposing private data. Among the primary variants of FL, vertical federated learning (VFL) addresses feature-partitioned data held by multiple institutions, each holding complementary information for the same set of users. However, existing VFL methods often impose restrictive assumptions such as a small number of participating parties, fully aligned data, or only using labeled data. In this work, we reinterpret alignment gaps in VFL as missing data problems and propose a unified framework that accommodates both training and inference under arbitrary alignment and labeling scenarios, while supporting diverse missingness mechanisms. In the experiments on 168 configurations spanning four benchmark datasets, six training-time missingness patterns, and seven testing-time missingness patterns, our method outperforms all baselines in 160 cases with an average gap of 9.6 percentage points over the next-best competitors. To the best of our knowledge, this is the first VFL framework to jointly handle arbitrary data alignment, unlabeled data, and multi-party collaboration all at once.

Problem

Research questions and friction points this paper is trying to address.

Handles feature-partitioned data with alignment gaps

Supports unlabeled data and multi-party collaboration

Addresses diverse missingness mechanisms in VFL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Latent Variable Model for VFL

Handles arbitrary alignment and labeling

Supports multi-party missing data

🔎 Similar Papers

Computation and Communication Efficient Lightweighting Vertical Federated Learning

2024-03-30arXiv.orgCitations: 1

Bosch Group

Renningen, BW, DE

2026 Fall Applied Science Internship - Reinforcement Learning & Optimization (Machine Learning) - United States, PhD Student Science Recruiting

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

Machine Learning Engineer