Federated Imputation under Heterogeneous Feature Spaces

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of missing value imputation in federated learning scenarios where clients possess partially overlapping feature spaces. To tackle this issue, the authors propose FedHF-Impute, a novel framework that leverages statistical dependencies among features in heterogeneous-feature federated settings. By constructing a globally shared feature graph and incorporating message-passing mechanisms from graph neural networks, the method enables distinct modeling of structural and general missingness patterns, thereby facilitating indirect knowledge transfer across clients. Unlike conventional parameter-averaging approaches, FedHF-Impute significantly enhances imputation accuracy, achieving RMSE reductions of 26.9% and 8.4% on the SECOM and AirQuality datasets, respectively, and performing within 0.3% of the best baseline on PhysioNET.

📝 Abstract

Federated Learning (FL) enables collaborative training across decentralized clients, but most methods assume aligned feature schemas, an assumption that rarely holds in tabular settings where clients observe only partially overlapping feature subsets. In these heterogeneous feature spaces, parameter-averaging methods (e.g., FedAvg) transfer little information across weakly overlapping or disjoint feature groups, limiting their effectiveness for federated imputation. To overcome this, we propose \textbf{FedHF-Impute}, a federated imputation framework that separates structural feature unavailability from conventional missingness and uses a shared global feature graph to propagate information across statistically related features through message passing. This enables indirect cross-client knowledge transfer, even when features are never jointly observed locally, while preserving standard federated communication. Under simulated partial schema overlap on the SECOM and AirQuality datasets, FedHF-Impute improves imputation accuracy (RMSE) over FL baselines by 26.9\%, and 8.4\% respectively, while achieving comparable performance on PhysioNET, with only a 0.3\% difference relative to the best baseline.

Problem

Research questions and friction points this paper is trying to address.

Federated Imputation

Heterogeneous Feature Spaces

Missing Data

Feature Alignment

Decentralized Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Imputation

Heterogeneous Feature Spaces

Feature Graph