๐ค AI Summary
In vertical federated learning (VFL), collaborative feature selection across clients remains challenging, and existing methods neglect inter-client feature interactions. Method: This paper proposes the first multi-stage ensemble-based feature selection framework that explicitly models inter-client feature interactions. It introduces a cross-client-aware conditional feature synthesis mechanism and a learnable selector ensemble architecture, enabling feature-level collaborative optimization without sharing private gradientsโthus preserving data privacy. The framework incorporates multi-task selectors, synthetic-embedding-driven ensemble strategies, and joint training on real data and refined embeddings. Contribution/Results: Evaluated on multiple benchmark datasets, the method significantly outperforms state-of-the-art approaches, achieving average prediction accuracy improvements of 3.2%โ5.8%.
๐ Abstract
Vertical federated learning (VFL) enables a paradigm for vertically partitioned data across clients to collaboratively train machine learning models. Feature selection (FS) plays a crucial role in Vertical Federated Learning (VFL) due to the unique nature that data are distributed across multiple clients. In VFL, different clients possess distinct subsets of features for overlapping data samples, making the process of identifying and selecting the most relevant features a complex yet essential task. Previous FS efforts have primarily revolved around intra-client feature selection, overlooking vital feature interaction across clients, leading to subpar model outcomes. We introduce ICAFS, a novel multi-stage ensemble approach for effective FS in VFL by considering inter-client interactions. By employing conditional feature synthesis alongside multiple learnable feature selectors, ICAFS facilitates ensemble FS over these selectors using synthetic embeddings. This method bypasses the limitations of private gradient sharing and allows for model training using real data with refined embeddings. Experiments on multiple real-world datasets demonstrate that ICAFS surpasses current state-of-the-art methods in prediction accuracy.