VolTA-3D: Self-Supervised Learning for Brain MRI using 3D Volumetric Token Alignment

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Current self-supervised methods for 3D brain MRI exhibit limited generalization across datasets, imaging protocols, and downstream tasks. To address this, this work proposes VolTA-3D, a novel framework that jointly models global semantic consistency and local anatomical structure in 3D self-supervised representation learning—a first in this domain. Built upon a 3D Vision Transformer within a student-teacher architecture, VolTA-3D aligns global class-style tokens with local patch tokens and incorporates fine-grained structural reconstruction. Evaluated on cross-domain tasks including hippocampus segmentation, gender classification, and Alzheimer’s disease identification, the method significantly outperforms random initialization baselines, demonstrating the strong transferability and robustness of the learned representations.
📝 Abstract
Self-supervised learning (SSL) has advanced medical image analysis be enabling learning form large unlabelled data. However, in brain magnetic resonance imaging (MRI), most 3D models remain specialized for either segmentation of classification, limiting their ability to generalize across datasets, imaging protocols,, and downstream tasks. This lack of transferability constrains the clinical utility of 3D MRI models, despite the availability of unlabeled volumetric data. We present Volta-3D, a self-supervised 3D Vision Transformer framework designed to learn transferable volumetric representations. Volta-3D jointly aligns global class-style tokens and local patch tokens within a student-teacher paradigm and enforces fine-grained structural reconstruction. This combined global-local alignment addresses the limited semantic diversity and subtle anatomical characteristics of brain MRI, which challenges existing SSL approaches. We evaluate Volta-3D on multiple out-of-distribution downstream tasks, including hippocampal segmentation and classification of sex and Alzheimer's disease versus healthy controls. Across all tasks, representations learned by Volta-3D outperform randomly initialized baselines, demonstrating improved transferability and robustness under domain shift. Hence jointly enforcing global semantic consistency and local structural learning during pretraining enables broader concept learning from unlabeled brain MRI data. Overall VolTA-3D supports effective multi-task downstream performance with task-specific pertaining, a step towards generalizable and clinically viable 3D models.
Problem

Research questions and friction points this paper is trying to address.

self-supervised learning
brain MRI
3D representation
transferability
domain shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
3D Vision Transformer
volumetric token alignment
transferable representation
brain MRI
🔎 Similar Papers
A
Amy Makawana
Institute of Health Informatics, University College London, London, UK
A
Abhijeet Parida
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington, DC USA
M
Marius George Linguraru
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington, DC USA; School of Medicine and Health Sciences, George Washington University, Washington, DC, USA
Julia Ive
Julia Ive
University College London
Syed Muhammad Anwar
Syed Muhammad Anwar
Childrens National Hospital/George Washington University
Biomedical Signal processingmedical image analysisgraph learningself-supervised learning