From Transthoracic to Transesophageal: Cross-Modality Generation using LoRA Diffusion

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address the challenge of cross-modal generation hindered by scarcity of transesophageal echocardiography (TEE) data, this paper proposes a lightweight diffusion model framework fine-tuned via Low-Rank Adaptation (LoRA) for high-fidelity TTE-to-TEE ultrasound image synthesis. Our method introduces three key innovations: (1) MaskR², a lightweight remapping layer that unifies multi-format anatomical masks and enforces conditional channel alignment; (2) LoRA adaptation applied exclusively to MLP layers of the diffusion model, drastically reducing computational overhead; and (3) a mask-guided generation scheme coupled with hybrid training. Evaluated using fewer than 200 real TEE frames, our synthesized data significantly improves multi-structure cardiac segmentation—yielding substantial average Dice score gains and notably enhancing robustness in segmenting rare structures, particularly the right heart chambers.

Technology Category

Application Category

📝 Abstract

Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality. We address this gap by adapting a TTE-trained, mask-conditioned diffusion backbone to TEE with only a limited number of new cases and adapters as small as $10^5$ parameters. Our pipeline combines Low-Rank Adaptation with MaskR$^2$, a lightweight remapping layer that aligns novel mask formats with the pretrained model's conditioning channels. This design lets users adapt models to new datasets with a different set of anatomical structures to the base model's original set. Through a targeted adaptation strategy, we find that adapting only MLP layers suffices for high-fidelity TEE synthesis. Finally, mixing less than 200 real TEE frames with our synthetic echoes improves the dice score on a multiclass segmentation task, particularly boosting performance on underrepresented right-heart structures. Our results demonstrate that (1) semantically controlled TEE images can be generated with low overhead, (2) MaskR$^2$ effectively transforms unseen mask formats into compatible formats without damaging downstream task performance, and (3) our method generates images that are effective for improving performance on a downstream task of multiclass segmentation.

Problem

Research questions and friction points this paper is trying to address.

Generate TEE images using limited data and small adapters

Align new mask formats with pretrained model via MaskR2

Improve segmentation performance with synthetic TEE images

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA Diffusion for cross-modality image generation

MaskR2 aligns novel mask formats efficiently

Adapts MLP layers for high-fidelity TEE synthesis

🔎 Similar Papers

Multi-Branch Generative Models for Multichannel Imaging with an Application to PET/CT Synergistic Reconstruction