🤖 AI Summary
Existing parameter-efficient fine-tuning methods struggle to differentially model the diverse pathological features present in multi-label head CT scans. To address this limitation, this work proposes the Mixture of Low-Rank Experts (MoLRE) framework, which introduces a mixture-of-experts mechanism into low-rank adaptation for the first time. MoLRE employs multiple specialized low-rank adapters coupled with an unsupervised soft routing strategy to enable label-free, conditional feature adaptation while adding fewer than 0.5% additional parameters. Comprehensive evaluation on over 70,000 head CT scans spanning 75 pathology classes demonstrates that MoLRE consistently enhances abnormality detection performance across six prominent 2D and 3D medical foundation models. Notably, MedGemma+MoLRE achieves an average AUC of 0.917, with the most substantial gains observed in both general-purpose and medical-specific models (+4.3% to +4.6%).
📝 Abstract
Foundation models pre-trained on large-scale datasets demonstrate strong transfer learning capabilities; however, their adaptation to complex multi-label diagnostic tasks-such as comprehensive head CT finding detection-remains understudied. Standard parameter-efficient fine-tuning methods such as LoRA apply uniform adaptations across pathology types, which may limit performance for diverse medical findings. We propose a Mixture of Low-Rank Experts (MoLRE) framework that extends LoRA with multiple specialized low-rank adapters and unsupervised soft routing. This approach enables conditional feature adaptation with less than 0.5% additional parameters and without explicit pathology supervision. We present a comprehensive benchmark of MoLRE across six state-of-the-art medical imaging foundation models spanning 2D and 3D architectures, general-domain, medical-domain, and head CT-specific pretraining, and model sizes ranging from 7M to 431M parameters. Using over 70,000 non-contrast head CT scans with 75 annotated findings-including hemorrhage, infarction, trauma, mass lesions, structural abnormalities, and chronic changes-our experiments demonstrate consistent performance improvements across all models. Gains vary substantially: general-purpose and medical-domain models show the largest improvements (DINOv3-Base: +4.6%; MedGemma: +4.3%), whereas 3D CT-specialized or very large models show more modest gains (+0.2-1.3%). The combination of MoLRE and MedGemma achieves the highest average detection AUC of 0.917. These findings highlight the importance of systematic benchmarking on target clinical tasks, as pretraining domain, architecture, and model scale interact in non-obvious ways.