MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical AI models predominantly rely on task-specific conditional distribution modeling, limiting their ability to support flexible cross-modal and cross-task reasoning. To address this, we propose MetaVoxel—the first unified diffusion-based generative framework jointly modeling medical imaging (T1-weighted MRI) and structured clinical metadata (e.g., age, sex). MetaVoxel learns the multimodal joint distribution via a single end-to-end diffusion process, incorporating cross-modal embedding alignment and joint noise prediction to enable zero-shot inference from arbitrary input subsets. Evaluated on over 10,000 MRI scans from nine heterogeneous sources, a single MetaVoxel model simultaneously achieves image synthesis, continuous age estimation, and binary sex classification—matching or surpassing dedicated task-specific baselines. This breaks away from conventional conditional modeling paradigms, markedly enhancing model generalizability and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than 10,000 T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference.Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.
Problem

Research questions and friction points this paper is trying to address.

Models joint distribution of imaging and clinical metadata
Unifies tasks requiring separate conditional models
Enables flexible zero-shot inference without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint diffusion modeling of imaging and metadata
Single diffusion process spanning all variables
Flexible zero-shot inference without retraining
🔎 Similar Papers
No similar papers found.
Y
Yihao Liu
Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, US.
Chenyu Gao
Chenyu Gao
Electrical and Computer Engineering, Vanderbilt University
Medical Image AnalysisComputer Vision
Lianrui Zuo
Lianrui Zuo
Vanderbilt University
Medical image analysisMRICTImage harmonizationImage synthesis
M
Michael E. Kim
Department of Computer Science, Vanderbilt University, Nashville, TN, US.
B
Brian D. Boyd
Center for Cognitive Medicine, Department of Psychiatry and Behavioral Science, Vanderbilt University Medical Center, Nashville, TN, US.
L
Lisa L. Barnes
Department of Neurological Sciences and Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL.
W
Walter A. Kukull
Washington University in St. Louis, St Louis, MO, US.
L
Lori L. Beason-Held
Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, MD.
S
Susan M. Resnick
Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, MD.
T
Timothy J. Hohman
Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Nashville, TN.
W
Warren D. Taylor
Center for Cognitive Medicine, Department of Psychiatry and Behavioral Science, Vanderbilt University Medical Center, Nashville, TN, US.
B
Bennett A. Landman
Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, US.