Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis

πŸ“… 2025-11-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenges of high dimensionality, heterogeneity in fMRI data, and lack of neuroscientific validation, this paper introduces the first whole-brain 4D BOLD sequence generation framework conditioned on cognitive tasks. Methodologically, it innovatively integrates a diffusion Transformer for voxel-level conditional generation, coupled with 3D VQ-GAN latent-space compression, a CNN-Transformer hybrid architecture, AdaLN-Zero normalization, and cross-attention mechanisms to enable robust, task-driven dynamic modeling. Evaluated on the Human Connectome Project (HCP) dataset, generated sequences achieve a correlation of 0.83 with ground-truth task activation maps and a representation similarity of 0.98β€”both metrics improve steadily with model scale and significantly outperform U-Net baselines. This work establishes the first interpretable and neuroscientifically verifiable paradigm for cognitive-response generation, bridging computational neuroscience and generative neuroimaging with a principled, biologically grounded framework.

Technology Category

Application Category

πŸ“ Abstract
Generating whole-brain 4D fMRI sequences conditioned on cognitive tasks remains challenging due to the high-dimensional, heterogeneous BOLD dynamics across subjects/acquisitions and the lack of neuroscience-grounded validation. We introduce the first diffusion transformer for voxelwise 4D fMRI conditional generation, combining 3D VQ-GAN latent compression with a CNN-Transformer backbone and strong task conditioning via AdaLN-Zero and cross-attention. On HCP task fMRI, our model reproduces task-evoked activation maps, preserves the inter-task representational structure observed in real data (RSA), achieves perfect condition specificity, and aligns ROI time-courses with canonical hemodynamic responses. Performance improves predictably with scale, reaching task-evoked map correlation of 0.83 and RSA of 0.98, consistently surpassing a U-Net baseline on all metrics. By coupling latent diffusion with a scalable backbone and strong conditioning, this work establishes a practical path to conditional 4D fMRI synthesis, paving the way for future applications such as virtual experiments, cross-site harmonization, and principled augmentation for downstream neuroimaging models.
Problem

Research questions and friction points this paper is trying to address.

Generating whole-brain 4D fMRI sequences conditioned on cognitive tasks
Addressing high-dimensional heterogeneous BOLD dynamics across subjects
Lacking neuroscience-grounded validation for fMRI synthesis models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion transformer with 3D VQ-GAN latent compression
CNN-Transformer backbone with AdaLN-Zero conditioning
Scalable architecture for conditional 4D fMRI synthesis
πŸ”Ž Similar Papers
No similar papers found.