Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
This work addresses the limitations of general-purpose variational autoencoders (VAEs), such as those in Stable Diffusion, which constrain reconstruction fidelity in medical image super-resolution. The study demonstrates for the first time that the VAE—not the diffusion architecture—is the primary performance bottleneck. To overcome this, the authors propose MedVAE, a VAE specifically designed for medical imaging, which significantly enhances reconstruction quality while keeping the rest of the diffusion model unchanged. MedVAE is pretrained on 1.6 million medical images and systematically evaluated through ablation studies incorporating latent diffusion models, wavelet analysis, and multiple scheduling strategies. Experiments on knee MRI, brain MRI, and chest X-ray datasets show PSNR improvements of 2.91–3.29 dB without increasing hallucination rates. Furthermore, VAE reconstruction quality strongly predicts downstream super-resolution performance (R² = 0.67).

Technology Category

Application Category

📝 Abstract
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on reconstruction quality. In a controlled experiment holding all other pipeline components fixed, replacing the generic Stable Diffusion VAE with MedVAE, a domain-specific autoencoder pretrained on more than 1.6 million medical images, yields +2.91 to +3.29 dB PSNR improvement across knee MRI, brain MRI, and chest X-ray (n = 1,820; Cohen's d = 1.37 to 1.86, all p < 10^{-20}, Wilcoxon signed-rank). Wavelet decomposition localises the advantage to the finest spatial frequency bands encoding anatomically relevant fine structure. Ablations across inference schedules, prediction targets, and generative architectures confirm the gap is stable within plus or minus 0.15 dB, while hallucination rates remain comparable between methods (Cohen's h < 0.02 across all datasets), establishing that reconstruction fidelity and generative hallucination are governed by independent pipeline components. These results provide a practical screening criterion: autoencoder reconstruction quality, measurable without diffusion training, predicts downstream SR performance (R^2 = 0.67), suggesting that domain-specific VAE selection should precede diffusion architecture search. Code and trained model weights are publicly available at https://github.com/sebasmos/latent-sr.
Problem

Research questions and friction points this paper is trying to address.

medical image super-resolution
latent diffusion models
domain-specific representations
variational autoencoders
reconstruction fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific VAE
medical image super-resolution
latent diffusion models
reconstruction fidelity
generative hallucination
🔎 Similar Papers
No similar papers found.
S
Sebastian Cajas
MIT Critical Data, Massachusetts Institute of Technology, Cambridge, MA, USA.
A
Ashaba Judith
Department of Biomedical Engineering, Mbarara University of Science and Technology, Uganda.
R
Rahul Gorijavolu
MIT Critical Data, Massachusetts Institute of Technology, Cambridge, MA, USA.
S
Sahil Kapadia
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
H
Hillary Clinton Kasimbazi
Department of Radiology and Radiotherapy, Makerere University, Uganda.
L
Leo Kinyera
Department of Biomedical Engineering, Mbarara University of Science and Technology, Uganda.
E
Emmanuel Paul Kwesiga
Department of Applied Natural Sciences, Technical University of Applied Sciences Lübeck (TH Lübeck), Lübeck, Germany.
S
Sri Sri Jaithra Varma Manthena
MIT Critical Data, Massachusetts Institute of Technology, Cambridge, MA, USA.
Luis Filipe Nakayama
Luis Filipe Nakayama
Visiting Student, Massachusetts Institute of Technology
OphthalmologyRetinaArtificial IntelligenceData Science
N
Ninsiima Doreen
Department of Biomedical Engineering, Mbarara University of Science and Technology, Uganda.
Leo Anthony Celi
Leo Anthony Celi
Massachusetts Institute of Technology