Sparse Spectral LoRA: Routed Experts for Medical VLMs

๐Ÿ“… 2026-04-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses key challenges in medical vision-language modelsโ€”namely, cross-dataset interference under heterogeneous supervision, sensitivity to data mixing, and catastrophic forgetting in continual learning. To mitigate these issues, the authors propose MedQwen, a parameter-efficient medical vision-language model that introduces non-overlapping SVD-based segmented expert initialization for the first time. Integrated with a spectral-routing mixture-of-experts (MoE) mechanism, low-rank adaptation (LoRA), and residual compensation with scaling strategies, MedQwen achieves stable expert specialization and consistent routing without modifying the base architecture. Evaluated across 23 medical datasets, the method substantially outperforms baseline approaches, attaining zero-shot classification performance comparable to full fine-tuning while reducing trainable parameters by 339ร— and limiting sequential forgetting to approximately 5%.
๐Ÿ“ Abstract
Large vision-language models (VLMs) excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to data regime (i.e., how the supervisory signals are mixed). In realistic clinical workflows, data and tasks arrive sequentially, so naive continual training further leads to catastrophic forgetting. To address these challenges, we propose MedQwen, a parameter-efficient medical VLM that couples a spectrally routed Mixture-of-Experts (MoE) with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE, without changing the base architecture. Concretely, we initialize each expert from non-overlapping singular value decomposition (SVD) segments of the pretrained weight and introduce a residual compensation and scaling scheme to enable stable expert specialization and consistent routing under distribution shift. Across 23 medical datasets covering visual question answering, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong, reliable performance: it approaches full fine-tuning on zero-shot classification with 339$\times$ fewer trainable parameters, and reduces sequential forgetting to $\sim$5\% where strong baselines degrade by $>$20-50\%.
Problem

Research questions and friction points this paper is trying to address.

medical vision-language models
cross-dataset interference
catastrophic forgetting
heterogeneous supervision
continual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Spectral LoRA
Mixture-of-Experts
singular value decomposition
parameter-efficient tuning
catastrophic forgetting
๐Ÿ”Ž Similar Papers
No similar papers found.