🤖 AI Summary
This study addresses the challenge of non-biological heterogeneity in multi-center, multi-sequence brain MRI caused by variations in scanners and protocols, which severely impairs model generalization. To this end, the authors propose MMH, a unified framework that, for the first time, achieves cross-site, multi-sequence MRI style alignment and disentanglement without requiring paired data. The method integrates biomedical semantic priors through a triplanar attention BiomedCLIP encoder to explicitly separate anatomical structure from imaging style, and further employs a diffusion-based global coordinator and a target-domain fine-tuner to enable sequence-aware style normalization. Evaluated on 4,163 T1- and T2-weighted MRI scans, MMH significantly outperforms existing approaches in feature clustering, voxel-wise contrast, tissue segmentation, and downstream tasks including age and site classification.
📝 Abstract
Aggregating multi-site brain MRI data can enhance deep learning model training, but also introduces non-biological heterogeneity caused by site-specific variations (e.g., differences in scanner vendors, acquisition parameters, and imaging protocols) that can undermine generalizability. Recent retrospective MRI harmonization seeks to reduce such site effects by standardizing image style (e.g., intensity, contrast, noise patterns) while preserving anatomical content. However, existing methods often rely on limited paired traveling-subject data or fail to effectively disentangle style from anatomy. Furthermore, most current approaches address only single-sequence harmonization, restricting their use in real-world settings where multi-sequence MRI is routinely acquired. To this end, we introduce MMH, a unified framework for multi-site multi-sequence brain MRI harmonization that leverages biomedical semantic priors for sequence-aware style alignment. MMH operates in two stages: (1) a diffusion-based global harmonizer that maps MR images to a sequence-specific unified domain using style-agnostic gradient conditioning, and (2) a target-specific fine-tuner that adapts globally aligned images to desired target domains. A tri-planar attention BiomedCLIP encoder aggregates multi-view embeddings to characterize volumetric style information, allowing explicit disentanglement of image styles from anatomy without requiring paired data. Evaluations on 4,163 T1- and T2-weighted MRIs demonstrate MMH's superior harmonization over state-of-the-art methods in image feature clustering, voxel-level comparison, tissue segmentation, and downstream age and site classification.