Curia-2: Scaling Self-Supervised Learning for Radiology Foundation Models

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first billion-parameter multimodal radiology foundation model tailored for both CT and MRI imaging, aiming to alleviate the growing workload of radiologists. By enhancing the self-supervised pretraining strategy and representation learning mechanism of the Curia framework, the approach enables efficient training of large-scale Vision Transformers on 2D and 3D medical images. The study also restructures the evaluation benchmark CuriaBench into a dual-track system to better assess model performance. The resulting model surpasses existing foundation models on general vision tasks and achieves performance comparable to vision–language models on clinically critical tasks such as lesion detection, significantly advancing self-supervised learning in multimodal radiological imaging.
📝 Abstract
The rapid growth of medical imaging has fueled the development of Foundation Models (FMs) to reduce the growing, unsustainable workload on radiologists. While recent FMs have shown the power of large-scale pre-training to CT and MRI analysis, there remains significant room to optimize how these models learn from complex radiological volumes. Building upon the Curia framework, this work introduces Curia-2, which significantly improves the original pre-training strategy and representation quality to better capture the specificities of radiological data. The proposed methodology enables scaling the architecture up to billion-parameter Vision Transformers, marking a first for multi-modal CT and MRI FMs. Furthermore, we formalize the evaluation of these models by extending and restructuring CuriaBench into two distinct tracks: a 2D track tailored for slice-based vision models and a 3D track for volumetric benchmarking. Our results demonstrate that Curia-2 outperforms all FMs on vision-focused tasks and fairs competitively to vision-language models on clinically complex tasks such as finding detection. Weights will be made publicly available to foster further research.
Problem

Research questions and friction points this paper is trying to address.

Foundation Models
Self-Supervised Learning
Radiology
Medical Imaging
Vision Transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
radiology foundation models
Vision Transformer
multi-modal medical imaging
CuriaBench
🔎 Similar Papers
No similar papers found.
A
Antoine Saporta
Raidium, 27 rue du Faubourg Saint-Jacques, 75014 Paris, France
B
Baptiste Callard
Raidium, 27 rue du Faubourg Saint-Jacques, 75014 Paris, France
Corentin Dancette
Corentin Dancette
Raidium
Deep LearningVisual Question AnsweringBiasesComputer VisionMedical Imaging
Julien Khlaut
Julien Khlaut
Raidium
Charles Corbière
Charles Corbière
Senior ML Researcher, Raidium
deep learningcomputer visionmedical imagingAI safety
L
Leo Butsanets
Raidium, 27 rue du Faubourg Saint-Jacques, 75014 Paris, France
A
Amaury Prat
Raidium, 27 rue du Faubourg Saint-Jacques, 75014 Paris, France
Pierre Manceron
Pierre Manceron
Raidium
artificial intelligencehealthcarerobotics