FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

๐Ÿ“… 2024-12-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In federated learning (FL), data heterogeneity frequently triggers Simpsonโ€™s paradox, causing the global model to deviate from the true underlying distribution and suffer performance degradation. To address this, this work introduces counterfactual reasoning into FL aggregation for the first time, proposing a factor-decoupled local data intervention mechanism: critical factors are identified and substituted via controlled-variable sampling to align local data distributions with the global one; additionally, a Factor De-Correlation (FDC) loss is designed to enforce feature independence, systematically mitigating model bias induced by the paradox. Evaluated on six heterogeneous benchmarks under communication constraints, our method achieves significant accuracy gains over state-of-the-art FL algorithms, with up to 37% faster convergence. It robustly enhances the global modelโ€™s representational fidelity to the true distribution and overall generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
Federated learning (FL) is a promising technology for data privacy and distributed optimization, but it suffers from data imbalance and heterogeneity among clients. Existing FL methods try to solve the problems by aligning client with server model or by correcting client model with control variables. These methods excel on IID and general Non-IID data but perform mediocrely in Simpson's Paradox scenarios. Simpson's Paradox refers to the phenomenon that the trend observed on the global dataset disappears or reverses on a subset, which may lead to the fact that global model obtained through aggregation in FL does not accurately reflect the distribution of global data. Thus, we propose FedCFA, a novel FL framework employing counterfactual learning to generate counterfactual samples by replacing local data critical factors with global average data, aligning local data distributions with the global and mitigating Simpson's Paradox effects. In addition, to improve the quality of counterfactual samples, we introduce factor decorrelation (FDC) loss to reduce the correlation among features and thus improve the independence of extracted factors. We conduct extensive experiments on six datasets and verify that our method outperforms other FL methods in terms of efficiency and global model accuracy under limited communication rounds.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Simpson's Paradox
Data Distribution Disparity
Innovation

Methods, ideas, or system contributions that make the work stand out.

FedCFA
Counterfactual Learning
Factorial De-correlation
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhonghua Jiang
Zhonghua Jiang
Zhejiang University
Multimodal LLMEfficient AI3D GenerationFederated Learning
J
Jimin Xu
Zhejiang University
S
Shengyu Zhang
Zhejiang University
T
Tao Shen
Zhejiang University
J
Jiwei Li
Zhejiang University
Kun Kuang
Kun Kuang
Zhejiang University
Causal InferenceData MiningMachine Learning
Haibin Cai
Haibin Cai
East China Normal University
F
Fei Wu
Zhejiang University