🤖 AI Summary
In vertical federated learning (VFL), participants possess heterogeneous feature sets, making it challenging to quantify their individual modeling contributions. Existing participant selection methods are primarily designed for horizontal FL and fail to address feature complementarity and task-specific utility in VFL.
Method: We propose VFL-RPS—the first VFL-specific participant selection framework—leveraging feature-space correlation modeling and lightweight proxy model pretraining. It jointly exploits feature alignment analysis and gradient sensitivity evaluation to efficiently score participant contributions prior to training.
Contribution/Results: VFL-RPS is the first method to jointly optimize for both feature complementarity and task adaptability in VFL, supporting both regression and classification tasks. Experiments across multiple benchmark datasets show that selecting only 30%–50% of participants achieves comparable accuracy to full participation, while significantly reducing communication and computational overhead. VFL-RPS consistently outperforms state-of-the-art baselines.
📝 Abstract
Federated Learning (FL) allows collaboration between different parties, while ensuring that the data across these parties is not shared. However, not every collaboration is helpful in terms of the resulting model performance. Therefore, it is an important challenge to select the correct participants in a collaboration. As it currently stands, most of the efforts in participant selection in the literature have focused on Horizontal Federated Learning (HFL), which assumes that all features are the same across all participants, disregarding the possibility of different features across participants which is captured in Vertical Federated Learning (VFL). To close this gap in the literature, we propose a novel method VFL-RPS for participant selection in VFL, as a pre-training step. We have tested our method on several data sets performing both regression and classification tasks, showing that our method leads to comparable results as using all data by only selecting a few participants. In addition, we show that our method outperforms existing methods for participant selection in VFL.