From Transthoracic to Transesophageal: Cross-Modality Generation using LoRA Diffusion

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of cross-modal generation hindered by scarcity of transesophageal echocardiography (TEE) data, this paper proposes a lightweight diffusion model framework fine-tuned via Low-Rank Adaptation (LoRA) for high-fidelity TTE-to-TEE ultrasound image synthesis. Our method introduces three key innovations: (1) MaskR², a lightweight remapping layer that unifies multi-format anatomical masks and enforces conditional channel alignment; (2) LoRA adaptation applied exclusively to MLP layers of the diffusion model, drastically reducing computational overhead; and (3) a mask-guided generation scheme coupled with hybrid training. Evaluated using fewer than 200 real TEE frames, our synthesized data significantly improves multi-structure cardiac segmentation—yielding substantial average Dice score gains and notably enhancing robustness in segmenting rare structures, particularly the right heart chambers.

Technology Category

Application Category

📝 Abstract
Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality. We address this gap by adapting a TTE-trained, mask-conditioned diffusion backbone to TEE with only a limited number of new cases and adapters as small as $10^5$ parameters. Our pipeline combines Low-Rank Adaptation with MaskR$^2$, a lightweight remapping layer that aligns novel mask formats with the pretrained model's conditioning channels. This design lets users adapt models to new datasets with a different set of anatomical structures to the base model's original set. Through a targeted adaptation strategy, we find that adapting only MLP layers suffices for high-fidelity TEE synthesis. Finally, mixing less than 200 real TEE frames with our synthetic echoes improves the dice score on a multiclass segmentation task, particularly boosting performance on underrepresented right-heart structures. Our results demonstrate that (1) semantically controlled TEE images can be generated with low overhead, (2) MaskR$^2$ effectively transforms unseen mask formats into compatible formats without damaging downstream task performance, and (3) our method generates images that are effective for improving performance on a downstream task of multiclass segmentation.
Problem

Research questions and friction points this paper is trying to address.

Generate TEE images using limited data and small adapters
Align new mask formats with pretrained model via MaskR2
Improve segmentation performance with synthetic TEE images
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA Diffusion for cross-modality image generation
MaskR2 aligns novel mask formats efficiently
Adapts MLP layers for high-fidelity TEE synthesis
🔎 Similar Papers
No similar papers found.
E
Emmanuel Oladokun
University of Oxford
Y
Yuxuan Ou
University of Oxford
A
Anna Novikova
GE HealthCare, Cardiovascular Ultrasound R&D
D
Daria Kulikova
GE HealthCare, Cardiovascular Ultrasound R&D
S
Sarina Thomas
GE HealthCare, Cardiovascular Ultrasound R&D
J
Jurica Šprem
GE HealthCare, Cardiovascular Ultrasound R&D
Vicente Grau
Vicente Grau
University of Oxford
Medical image analysisComputational modelling in biomedicine