Semi-Supervised Federated Multi-Label Feature Selection with Fuzzy Information Measures

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the challenges of limited labeling capability and non-IID data distributions among clients in distributed multi-label learning, this paper proposes the first semi-supervised federated multi-label feature selection framework. The framework eliminates reliance on labeled client data, requiring only the upload of local fuzzy similarity matrices. At the server, a graph-based model captures feature relevance and redundancy, while a novel integration of fuzzy information measures and the PageRank algorithm enables robust feature importance ranking and noise-aware dimensionality reduction. Crucially, this work pioneers the incorporation of fuzzy information theory into federated feature selection, enabling efficient collaborative optimization under label-free and non-IID conditions. Extensive experiments on five cross-domain real-world datasets demonstrate that our method significantly outperforms existing federated and centralized baseline approaches across three key metrics: feature subset quality, multi-label classification performance, and generalization capability.

Technology Category

Application Category

📝 Abstract

Multi-label feature selection (FS) reduces the dimensionality of multi-label data by removing irrelevant, noisy, and redundant features, thereby boosting the performance of multi-label learning models. However, existing methods typically require centralized data, which makes them unsuitable for distributed and federated environments where each device/client holds its own local dataset. Additionally, federated methods often assume that clients have labeled data, which is unrealistic in cases where clients lack the expertise or resources to label task-specific data. To address these challenges, we propose a Semi-Supervised Federated Multi-Label Feature Selection method, called SSFMLFS, where clients hold only unlabeled data, while the server has limited labeled data. SSFMLFS adapts fuzzy information theory to a federated setting, where clients compute fuzzy similarity matrices and transmit them to the server, which then calculates feature redundancy and feature-label relevancy degrees. A feature graph is constructed by modeling features as vertices, assigning relevancy and redundancy degrees as vertex weights and edge weights, respectively. PageRank is then applied to rank the features by importance. Extensive experiments on five real-world datasets from various domains, including biology, images, music, and text, demonstrate that SSFMLFS outperforms other federated and centralized supervised and semi-supervised approaches in terms of three different evaluation metrics in non-IID data distribution setting.

Problem

Research questions and friction points this paper is trying to address.

Federated feature selection for multi-label data without centralized storage

Handling clients with unlabeled data using semi-supervised fuzzy measures

Addressing feature redundancy and relevancy in distributed non-IID settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised federated multi-label feature selection method

Fuzzy information theory adapted to federated setting

PageRank applied to rank features by importance

🔎 Similar Papers

Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning