FL-MedSegBench: A Comprehensive Benchmark for Federated Learning on Medical Image Segmentation

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of standardized evaluation benchmarks in federated learning for medical image segmentation, which has led to inconsistent and incomplete assessments. The authors present the first comprehensive federated learning benchmark specifically designed for medical image segmentation, encompassing ten imaging modalities and nine distinct tasks that span 2D/3D settings, multimodal data, and clinically relevant heterogeneity. They systematically evaluate eight general-purpose and five personalized federated methods across multiple dimensions, including accuracy, fairness, and communication efficiency. Their experiments demonstrate that personalized approaches—such as FedBN—generally outperform generic methods, yet no single algorithm achieves universal superiority. Notably, certain methods remain robust under low communication frequencies, and Ditto and FedRDN substantially improve performance for disadvantaged clients. The authors release an open-source toolkit to support reproducible research in this domain.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) offers a privacy-preserving paradigm for collaborative medical image analysis without sharing raw data. However, the absence of standardized benchmarks for medical image segmentation hinders fair and comprehensive evaluation of FL methods. To address this gap, we introduce FL-MedSegBench, the first comprehensive benchmark for federated learning on medical image segmentation. Our benchmark encompasses nine segmentation tasks across ten imaging modalities, covering both 2D and 3D formats with realistic clinical heterogeneity. We systematically evaluate eight generic FL (gFL) and five personalized FL (pFL) methods across multiple dimensions: segmentation accuracy, fairness, communication efficiency, convergence behavior, and generalization to unseen domains. Extensive experiments reveal several key insights: (i) pFL methods, particularly those with client-specific batch normalization (\textit{e.g.}, FedBN), consistently outperform generic approaches; (ii) No single method universally dominates, with performance being dataset-dependent; (iii) Communication frequency analysis shows normalization-based personalization methods exhibit remarkable robustness to reduced communication frequency; (iv) Fairness evaluation identifies methods like Ditto and FedRDN that protect underperforming clients; (v) A method's generalization to unseen domains is strongly tied to its ability to perform well across participating clients. We will release an open-source toolkit to foster reproducible research and accelerate clinically applicable FL solutions, providing empirically grounded guidelines for real-world clinical deployment. The source code is available at https://github.com/meiluzhu/FL-MedSegBench.
Problem

Research questions and friction points this paper is trying to address.

federated learning
medical image segmentation
benchmark
privacy-preserving
clinical heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Medical Image Segmentation
Benchmark
Personalized FL
Clinical Heterogeneity
🔎 Similar Papers
No similar papers found.
Meilu Zhu
Meilu Zhu
City University of Hong Kong
Machine LearningDeep LearningComputer VisionImage ProcessingFederated Learning
Z
Zhiwei Wang
Department of Electrical and Computer Engineering, The University of Hong Kong, Hong Kong, China
A
Axiu Mao
School of Communication Engineering, Hangzhou Dianzi University, Hang Zhou, China
Y
Yuxing Li
Department of Electrical and Computer Engineering, The University of Hong Kong, Hong Kong, China
Xiaohan Xing
Xiaohan Xing
Stanford University
Medical Image AnalysisOmics AnalysisDeep LearningMulti-modal Learning
Yixuan Yuan
Yixuan Yuan
Associate Professor in Chinese University of Hong Kong
Medical image analysisAI in healthcareBrain data analysisEndoscopy
E
Edmund Y. Lam
Department of Electrical and Computer Engineering, The University of Hong Kong, Hong Kong, China