MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing radiology report classification methods suffer from three key limitations: rule-based approaches lack generalizability; supervised models require extensive labeled data; and large language model (LLM)-based solutions are predominantly closed-source, computationally expensive, and restricted to English and single-label, unimodal settings. To address these, we propose the first open-source, multilingual, label-agnostic, and lightweight classification framework built on MedGemma-4B (4B parameters). Our method integrates zero- and few-shot prompting with efficient fine-tuning and introduces domain-specific data augmentation. With only dozens of training samples, the model achieves expert-level performance across seven multilingual, multimodal datasets: it attains an average macro-F1 of 88 on five chest X-ray tasks and a weighted F1 of 82 on Danish reports using just 80 samples. The framework is deployable on consumer-grade GPUs (24 GB VRAM), significantly lowering the deployment barrier for LLMs in clinical practice.

Technology Category

Application Category

📝 Abstract
Radiology reports contain rich clinical information that can be used to train imaging models without relying on costly manual annotation. However, existing approaches face critical limitations: rule-based methods struggle with linguistic variability, supervised models require large annotated datasets, and recent LLM-based systems depend on closed-source or resource-intensive models that are unsuitable for clinical use. Moreover, current solutions are largely restricted to English and single-modality, single-taxonomy datasets. We introduce MOSAIC, a multilingual, taxonomy-agnostic, and computationally efficient approach for radiological report classification. Built on a compact open-access language model (MedGemma-4B), MOSAIC supports both zero-/few-shot prompting and lightweight fine-tuning, enabling deployment on consumer-grade GPUs. We evaluate MOSAIC across seven datasets in English, Spanish, French, and Danish, spanning multiple imaging modalities and label taxonomies. The model achieves a mean macro F1 score of 88 across five chest X-ray datasets, approaching or exceeding expert-level performance, while requiring only 24 GB of GPU memory. With data augmentation, as few as 80 annotated samples are sufficient to reach a weighted F1 score of 82 on Danish reports, compared to 86 with the full 1600-sample training set. MOSAIC offers a practical alternative to large or proprietary LLMs in clinical settings. Code and models are open-source. We invite the community to evaluate and extend MOSAIC on new languages, taxonomies, and modalities.
Problem

Research questions and friction points this paper is trying to address.

Classifying radiological reports across languages and taxonomies efficiently
Overcoming limitations of rule-based and resource-intensive supervised models
Enabling clinical deployment without costly annotations or proprietary systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual open-access language model MedGemma-4B
Supports zero-shot prompting and lightweight fine-tuning
Achieves expert-level performance with minimal GPU memory
🔎 Similar Papers
No similar papers found.