DINO-BOLDNet: A DINOv3-Guided Multi-Slice Attention Network for T1-to-BOLD Generation

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of downstream functional brain analysis when BOLD fMRI data are missing or corrupted, this paper proposes, for the first time, a direct reconstruction method that synthesizes mean BOLD images from T1-weighted structural MRI. Methodologically, we employ a frozen DINOv3 self-supervised encoder to extract robust anatomical representations, incorporate a multi-slice attention mechanism to model cross-layer functional dependencies, and integrate a multi-scale decoder with a DINO-perceptual loss to ensure high-fidelity generation. Evaluated on a clinical dataset comprising 248 subjects, our approach significantly outperforms conditional GAN baselines, achieving state-of-the-art performance in both PSNR and MS-SSIM metrics. This work establishes a novel, interpretable, and high-accuracy paradigm for structure-to-function mapping, enabling reliable functional inference from structural scans alone.

Technology Category

Application Category

📝 Abstract
Generating BOLD images from T1w images offers a promising solution for recovering missing BOLD information and enabling downstream tasks when BOLD images are corrupted or unavailable. Motivated by this, we propose DINO-BOLDNet, a DINOv3-guided multi-slice attention framework that integrates a frozen self-supervised DINOv3 encoder with a lightweight trainable decoder. The model uses DINOv3 to extract within-slice structural representations, and a separate slice-attention module to fuse contextual information across neighboring slices. A multi-scale generation decoder then restores fine-grained functional contrast, while a DINO-based perceptual loss encourages structural and textural consistency between predictions and ground-truth BOLD in the transformer feature space. Experiments on a clinical dataset of 248 subjects show that DINO-BOLDNet surpasses a conditional GAN baseline in both PSNR and MS-SSIM. To our knowledge, this is the first framework capable of generating mean BOLD images directly from T1w images, highlighting the potential of self-supervised transformer guidance for structural-to-functional mapping.
Problem

Research questions and friction points this paper is trying to address.

Generates BOLD images from T1w images
Recovers missing BOLD information for downstream tasks
Integrates DINOv3 for structural-to-functional mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen DINOv3 encoder for structural representation
Employs slice-attention module for cross-slice context fusion
Applies DINO-based perceptual loss for feature consistency
🔎 Similar Papers
No similar papers found.
J
Jianwei Wang
School of Computer Science and Engineering, Jiangsu Provincial Joint International Research Laboratory of Medical Information Processing, Southeast University, Nanjing, China
Q
Qing Wang
Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Jiangsu Key Laboratory of Brain Science and Medicine, Southeast University, Nanjing, China
M
Menglan Ruan
School of Software Engineering, Jiangsu Provincial Joint International Research Laboratory of Medical Information Processing, Southeast University, Nanjing, China
Rongjun Ge
Rongjun Ge
Associate Professor at Southeast University; RPI; UWO
medical image analysisimage processingartificial intelligence
C
Chunfeng Yang
School of Computer Science and Engineering, Jiangsu Provincial Joint International Research Laboratory of Medical Information Processing, Southeast University, Nanjing, China
Y
Yang Chen
School of Computer Science and Engineering, Jiangsu Provincial Joint International Research Laboratory of Medical Information Processing, Southeast University, Nanjing, China
C
Chunming Xie
Department of Neurology, Affiliated ZhongDa Hospital, School of Medicine, Jiangsu Key Laboratory of Brain Science and Medicine, Southeast University, Nanjing, China