🤖 AI Summary
ComBAT—widely used for harmonizing multi-site diffusion MRI (dMRI) data—often yields biased corrections when core assumptions (e.g., independence between site effects and covariates) are violated, particularly under imbalanced demographic distributions (e.g., age) across sites. To address this, we propose Pairwise-ComBAT, the first variant to systematically quantify how population size, age distribution skew, covariate missingness, and bias strength affect harmonization performance. Leveraging both simulated and real multi-site dMRI data, we conduct rigorous statistical diagnostics and sensitivity analyses within a linear mixed-effects modeling framework. Our work establishes necessary demographic prerequisites for valid ComBAT application, substantially improves cross-site consistency of neuroimaging biomarkers, and provides five reproducibility-focused best-practice guidelines. These contributions advance methodological rigor for open-science collaboration and clinical translation of multi-site dMRI studies.
📝 Abstract
Over the years, ComBAT has become the standard method for harmonizing MRI-derived measurements, with its ability to compensate for site-related additive and multiplicative biases while preserving biological variability. However, ComBAT relies on a set of assumptions that, when violated, can result in flawed harmonization. In this paper, we thoroughly review ComBAT's mathematical foundation, outlining these assumptions, and exploring their implications for the demographic composition necessary for optimal results. Through a series of experiments involving a slightly modified version of ComBAT called Pairwise-ComBAT tailored for normative modeling applications, we assess the impact of various population characteristics, including population size, age distribution, the absence of certain covariates, and the magnitude of additive and multiplicative factors. Based on these experiments, we present five essential recommendations that should be carefully considered to enhance consistency and supporting reproducibility, two essential factors for open science, collaborative research, and real-life clinical deployment.