Diffusion MRI Transformer with a Diffusion Space Rotary Positional Embedding (D-RoPE)

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deep learning approaches struggle to effectively model the complex dependencies among spatial structure, diffusion directions, and associated weights in diffusion MRI (dMRI), and exhibit limited generalization across acquisition protocols—particularly with varying numbers of diffusion directions. To address these challenges, this work proposes a novel Transformer architecture tailored for dMRI, introducing for the first time a diffusion-aware rotational positional encoding (D-RoPE) that jointly encodes spatial and directional information. This design enables flexible input of arbitrary diffusion directions and enhances robustness across different scanning protocols. Combined with self-supervised masked autoencoding pretraining to learn universal representations, the model achieves significant improvements on downstream tasks: fine-tuning yields a 6% increase in accuracy for mild cognitive impairment classification and a 0.05 gain in correlation coefficient for cognitive score prediction, substantially outperforming baseline methods.
📝 Abstract
Diffusion Magnetic Resonance Imaging (dMRI) plays a critical role in studying microstructural changes in the brain. It is, therefore, widely used in clinical practice; yet progress in learning general-purpose representations from dMRI has been limited. A key challenge is that existing deep learning approaches are not well-suited to capture the unique properties of diffusion signals. Brain dMRI is normally composed of several brain volumes, each with different attenuation characteristics dependent on the direction and strength of the diffusion-sensitized gradients. Thus, there is a need to jointly model spatial, diffusion-weighting, and directional dependencies in dMRI. Furthermore, varying acquisition protocols (e.g., differing numbers of directions) further limit traditional models. To address these gaps, we introduce a diffusion space rotatory positional embedding (D-RoPE) plugged into our dMRI transformer to capture both the spatial structure and directional characteristics of diffusion data, enabling robust and transferable representations across diverse acquisition settings and an arbitrary number of diffusion directions. After self-supervised masked autoencoding pretraining, tests on several downstream tasks show that the learned representations and the pretrained model can provide competitive or superior performance compared to several baselines in these downstream tasks (even compared to a fully trained baseline); the finetuned features from our pretrained encoder resulted in a 6% higher accuracy in classifying mild cognitive impairment and a 0.05 increase in the correlation coefficient when predicting cognitive scores. Code is available at: github.com/gustavochau/D-RoPE.
Problem

Research questions and friction points this paper is trying to address.

diffusion MRI
representation learning
directional dependency
acquisition protocol variability
spatial-diffusion modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion MRI Transformer
Rotary Positional Embedding
Self-supervised Learning
Directional Encoding
dMRI Representation Learning
🔎 Similar Papers
No similar papers found.
G
Gustavo Chau Loo Kung
Stanford University
M
Mohammad Abbasi
Stanford University
C
Camila Blank
Stanford University
J
Juze Zhang
Stanford University
A
Alan Q. Wang
Stanford University
Sophie Ostmeier
Sophie Ostmeier
Stanford University
MLMedicine
Akshay Chaudhari
Akshay Chaudhari
Assistant Professor, Stanford University
Biomedical ImagingMulti-Modal LearningDeep LearningRadiology
K
Kilian Pohl
Stanford University
Ehsan Adeli
Ehsan Adeli
Stanford University
Computer VisionComputational NeurosciencePrecision HealthcareAmbient Intelligence