🤖 AI Summary
Medical report generation (MRG) faces critical challenges under multi-center federated learning (FL), including privacy sensitivity, heterogeneous data modalities, and severe communication constraints. Method: We propose FedMRG—the first privacy-preserving FL framework for MRG—featuring (i) low-rank gradient decomposition to drastically reduce communication overhead; (ii) client-aware contrastive learning and diagnosis-driven prompt encoding to mitigate statistical heterogeneity; and (iii) a dual-adapter co-adaptive decoding mechanism to jointly model cross-center imaging feature discrepancies and report stylistic variations. Results: Evaluated on our newly established FL-MRG benchmark, FedMRG achieves state-of-the-art clinical accuracy, significantly improves cross-center generalization, reduces communication cost by 62%, and enables deployment in bandwidth-constrained real-world clinical environments.
📝 Abstract
LLMs have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge, we present FedMRG, the first framework that leverages Federated Learning (FL) to enable privacy-preserving, multi-center development of LLM-driven MRG models, specifically designed to overcome the critical challenge of communication-efficient LLM training under multi-modal data heterogeneity. To start with, our framework tackles the fundamental challenge of communication overhead in FL-LLM tuning by employing low-rank factorization to efficiently decompose parameter updates, significantly reducing gradient transmission costs and making LLM-driven MRG feasible in bandwidth-constrained FL settings. Furthermore, we observed the dual heterogeneity in MRG under the FL scenario: varying image characteristics across medical centers, as well as diverse reporting styles and terminology preferences. To address this, we further enhance FedMRG with (1) client-aware contrastive learning in the MRG encoder, coupled with diagnosis-driven prompts, which capture both globally generalizable and locally distinctive features while maintaining diagnostic accuracy; and (2) a dual-adapter mutual boosting mechanism in the MRG decoder that harmonizes generic and specialized adapters to address variations in reporting styles and terminology. Through extensive evaluation of our established FL-MRG benchmark, we demonstrate the generalizability and adaptability of FedMRG, underscoring its potential in harnessing multi-center data and generating clinically accurate reports while maintaining communication efficiency.