ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Alzheimer’s disease (AD) exhibits highly heterogeneous progression, and existing methods struggle to integrate multidimensional clinical data while precisely controlling follow-up timing for longitudinal brain MRI synthesis. To address this, this work proposes ADP-DiT, an interval-aware, clinically guided diffusion Transformer that, for the first time, encodes follow-up intervals, demographic, diagnostic, and neuropsychological data as natural language prompts. These prompts are fused via dual text encoders (OpenCLIP and T5) to enrich semantic representation and are injected into the DiT architecture through adaptive layer normalization and cross-attention mechanisms. Evaluated on 3,321 scans from 712 subjects, ADP-DiT significantly outperforms baseline methods, achieving SSIM of 0.8739 and PSNR of 29.32 dB, while faithfully reproducing hallmark AD progression patterns such as ventricular enlargement and hippocampal atrophy, thereby enabling fine-grained, interpretable, and temporally controllable longitudinal image synthesis.

Technology Category

Application Category

📝 Abstract

Alzheimer's disease (AD) progresses heterogeneously across individuals, motivating subject-specific synthesis of follow-up magnetic resonance imaging (MRI) to support progression assessment. While Diffusion Transformers (DiT), an emerging transformer-based diffusion model, offer a scalable backbone for image synthesis, longitudinal AD MRI generation with clinically interpretable control over follow-up time and participant metadata remains underexplored. We present ADP-DiT, an interval-aware, clinically text-conditioned diffusion transformer for longitudinal AD MRI synthesis. ADP-DiT encodes follow-up interval together with multi-domain demographic, diagnostic (CN/MCI/AD), and neuropsychological information as a natural-language prompt, enabling time-specific control beyond coarse diagnostic stages. To inject this conditioning effectively, we use dual text encoders-OpenCLIP for vision-language alignment and T5 for richer clinical-language understanding. Their embeddings are fused into DiT through cross-attention for fine-grained guidance and adaptive layer normalization for global modulation. We further enhance anatomical fidelity by applying rotary positional embeddings to image tokens and performing diffusion in a pre-trained SDXL-VAE latent space to enable efficient high-resolution reconstruction. On 3,321 longitudinal 3T T1-weighted scans from 712 participants (259,038 image slices), ADP-DiT achieves SSIM 0.8739 and PSNR 29.32 dB, improving over a DiT baseline by +0.1087 SSIM and +6.08 dB PSNR while capturing progression-related changes such as ventricular enlargement and shrinking hippocampus. These results suggest that integrating comprehensive, subject-specific clinical conditions with architectures can improve longitudinal AD MRI synthesis.

Problem

Research questions and friction points this paper is trying to address.

Alzheimer's disease

longitudinal MRI synthesis

text-guided generation

disease progression modeling

subject-specific imaging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformer

Text-to-Image Synthesis

Longitudinal MRI