FL-MedSegBench: A Comprehensive Benchmark for Federated Learning on Medical Image Segmentation

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the lack of standardized evaluation benchmarks in federated learning for medical image segmentation, which has led to inconsistent and incomplete assessments. The authors present the first comprehensive federated learning benchmark specifically designed for medical image segmentation, encompassing ten imaging modalities and nine distinct tasks that span 2D/3D settings, multimodal data, and clinically relevant heterogeneity. They systematically evaluate eight general-purpose and five personalized federated methods across multiple dimensions, including accuracy, fairness, and communication efficiency. Their experiments demonstrate that personalized approaches—such as FedBN—generally outperform generic methods, yet no single algorithm achieves universal superiority. Notably, certain methods remain robust under low communication frequencies, and Ditto and FedRDN substantially improve performance for disadvantaged clients. The authors release an open-source toolkit to support reproducible research in this domain.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) offers a privacy-preserving paradigm for collaborative medical image analysis without sharing raw data. However, the absence of standardized benchmarks for medical image segmentation hinders fair and comprehensive evaluation of FL methods. To address this gap, we introduce FL-MedSegBench, the first comprehensive benchmark for federated learning on medical image segmentation. Our benchmark encompasses nine segmentation tasks across ten imaging modalities, covering both 2D and 3D formats with realistic clinical heterogeneity. We systematically evaluate eight generic FL (gFL) and five personalized FL (pFL) methods across multiple dimensions: segmentation accuracy, fairness, communication efficiency, convergence behavior, and generalization to unseen domains. Extensive experiments reveal several key insights: (i) pFL methods, particularly those with client-specific batch normalization (\textit{e.g.}, FedBN), consistently outperform generic approaches; (ii) No single method universally dominates, with performance being dataset-dependent; (iii) Communication frequency analysis shows normalization-based personalization methods exhibit remarkable robustness to reduced communication frequency; (iv) Fairness evaluation identifies methods like Ditto and FedRDN that protect underperforming clients; (v) A method's generalization to unseen domains is strongly tied to its ability to perform well across participating clients. We will release an open-source toolkit to foster reproducible research and accelerate clinically applicable FL solutions, providing empirically grounded guidelines for real-world clinical deployment. The source code is available at https://github.com/meiluzhu/FL-MedSegBench.

Problem

Research questions and friction points this paper is trying to address.

federated learning

medical image segmentation

benchmark

privacy-preserving

clinical heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Medical Image Segmentation

Benchmark