Semi-Supervised Federated Multi-Label Feature Selection with Fuzzy Information Measures

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of limited labeling capability and non-IID data distributions among clients in distributed multi-label learning, this paper proposes the first semi-supervised federated multi-label feature selection framework. The framework eliminates reliance on labeled client data, requiring only the upload of local fuzzy similarity matrices. At the server, a graph-based model captures feature relevance and redundancy, while a novel integration of fuzzy information measures and the PageRank algorithm enables robust feature importance ranking and noise-aware dimensionality reduction. Crucially, this work pioneers the incorporation of fuzzy information theory into federated feature selection, enabling efficient collaborative optimization under label-free and non-IID conditions. Extensive experiments on five cross-domain real-world datasets demonstrate that our method significantly outperforms existing federated and centralized baseline approaches across three key metrics: feature subset quality, multi-label classification performance, and generalization capability.

Technology Category

Application Category

📝 Abstract
Multi-label feature selection (FS) reduces the dimensionality of multi-label data by removing irrelevant, noisy, and redundant features, thereby boosting the performance of multi-label learning models. However, existing methods typically require centralized data, which makes them unsuitable for distributed and federated environments where each device/client holds its own local dataset. Additionally, federated methods often assume that clients have labeled data, which is unrealistic in cases where clients lack the expertise or resources to label task-specific data. To address these challenges, we propose a Semi-Supervised Federated Multi-Label Feature Selection method, called SSFMLFS, where clients hold only unlabeled data, while the server has limited labeled data. SSFMLFS adapts fuzzy information theory to a federated setting, where clients compute fuzzy similarity matrices and transmit them to the server, which then calculates feature redundancy and feature-label relevancy degrees. A feature graph is constructed by modeling features as vertices, assigning relevancy and redundancy degrees as vertex weights and edge weights, respectively. PageRank is then applied to rank the features by importance. Extensive experiments on five real-world datasets from various domains, including biology, images, music, and text, demonstrate that SSFMLFS outperforms other federated and centralized supervised and semi-supervised approaches in terms of three different evaluation metrics in non-IID data distribution setting.
Problem

Research questions and friction points this paper is trying to address.

Federated feature selection for multi-label data without centralized storage
Handling clients with unlabeled data using semi-supervised fuzzy measures
Addressing feature redundancy and relevancy in distributed non-IID settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised federated multi-label feature selection method
Fuzzy information theory adapted to federated setting
PageRank applied to rank features by importance