ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning

๐Ÿ“… 2025-04-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In vertical federated learning (VFL), collaborative feature selection across clients remains challenging, and existing methods neglect inter-client feature interactions. Method: This paper proposes the first multi-stage ensemble-based feature selection framework that explicitly models inter-client feature interactions. It introduces a cross-client-aware conditional feature synthesis mechanism and a learnable selector ensemble architecture, enabling feature-level collaborative optimization without sharing private gradientsโ€”thus preserving data privacy. The framework incorporates multi-task selectors, synthetic-embedding-driven ensemble strategies, and joint training on real data and refined embeddings. Contribution/Results: Evaluated on multiple benchmark datasets, the method significantly outperforms state-of-the-art approaches, achieving average prediction accuracy improvements of 3.2%โ€“5.8%.

Technology Category

Application Category

๐Ÿ“ Abstract
Vertical federated learning (VFL) enables a paradigm for vertically partitioned data across clients to collaboratively train machine learning models. Feature selection (FS) plays a crucial role in Vertical Federated Learning (VFL) due to the unique nature that data are distributed across multiple clients. In VFL, different clients possess distinct subsets of features for overlapping data samples, making the process of identifying and selecting the most relevant features a complex yet essential task. Previous FS efforts have primarily revolved around intra-client feature selection, overlooking vital feature interaction across clients, leading to subpar model outcomes. We introduce ICAFS, a novel multi-stage ensemble approach for effective FS in VFL by considering inter-client interactions. By employing conditional feature synthesis alongside multiple learnable feature selectors, ICAFS facilitates ensemble FS over these selectors using synthetic embeddings. This method bypasses the limitations of private gradient sharing and allows for model training using real data with refined embeddings. Experiments on multiple real-world datasets demonstrate that ICAFS surpasses current state-of-the-art methods in prediction accuracy.
Problem

Research questions and friction points this paper is trying to address.

Enhances feature selection in vertical federated learning
Addresses inter-client feature interaction limitations
Improves model accuracy without gradient sharing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage ensemble for inter-client feature selection
Conditional feature synthesis with learnable selectors
Ensemble FS using synthetic embeddings without gradient sharing
๐Ÿ”Ž Similar Papers
R
Ruochen Jin
University of Pennsylvania, Philadelphia, PA, USA; East China Normal University, Shanghai, China
Boning Tong
Boning Tong
University of Pennsylvania
S
Shu Yang
University of Pennsylvania, Philadelphia, PA, USA
Bojian Hou
Bojian Hou
Meta
Machine LearningArtificial IntelligenceTrustworthy (Gen)AILarge Language ModelHealthTech
L
Li Shen
University of Pennsylvania, Philadelphia, PA, USA