๐ค AI Summary
Public neuroimaging data are scarce and highly privacy-sensitive, hindering the clinical generalizability of existing AI models. To address this, we propose โHealthy System Learningโโa novel paradigm that trains the general-purpose volumetric foundation model NeuroVFM directly on raw, unanonymized clinical CT/MRI volumes. NeuroVFM employs a scalable joint volumetric embedding prediction architecture, integrated with lightweight visual instruction tuning and an open-source language model for end-to-end radiology report generation. It exhibits emergent neuroanatomical understanding and interpretable lesion localization capabilities. Across multiple clinical benchmarks, NeuroVFM achieves state-of-the-art performance. Its generated reports significantly outperform mainstream large language and multimodal models in diagnostic accuracy, triage appropriateness, and radiologist preference, while reducing hallucination rates and critical error rates by substantial margins.
๐ Abstract
Frontier artificial intelligence (AI) models, such as OpenAI's GPT-5 and Meta's DINOv3, have advanced rapidly through training on internet-scale public data, yet such systems lack access to private clinical data. Neuroimaging, in particular, is underrepresented in the public domain due to identifiable facial features within MRI and CT scans, fundamentally restricting model performance in clinical medicine. Here, we show that frontier models underperform on neuroimaging tasks and that learning directly from uncurated data generated during routine clinical care at health systems, a paradigm we call health system learning, yields high-performance, generalist neuroimaging models. We introduce NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture. NeuroVFM learns comprehensive representations of brain anatomy and pathology, achieving state-of-the-art performance across multiple clinical tasks, including radiologic diagnosis and report generation. The model exhibits emergent neuroanatomic understanding and interpretable visual grounding of diagnostic findings. When paired with open-source language models through lightweight visual instruction tuning, NeuroVFM generates radiology reports that surpass frontier models in accuracy, clinical triage, and expert preference. Through clinically grounded visual understanding, NeuroVFM reduces hallucinated findings and critical errors, offering safer clinical decision support. These results establish health system learning as a paradigm for building generalist medical AI and provide a scalable framework for clinical foundation models.