🤖 AI Summary
To address the challenge of cross-modal generation hindered by scarcity of transesophageal echocardiography (TEE) data, this paper proposes a lightweight diffusion model framework fine-tuned via Low-Rank Adaptation (LoRA) for high-fidelity TTE-to-TEE ultrasound image synthesis. Our method introduces three key innovations: (1) MaskR², a lightweight remapping layer that unifies multi-format anatomical masks and enforces conditional channel alignment; (2) LoRA adaptation applied exclusively to MLP layers of the diffusion model, drastically reducing computational overhead; and (3) a mask-guided generation scheme coupled with hybrid training. Evaluated using fewer than 200 real TEE frames, our synthesized data significantly improves multi-structure cardiac segmentation—yielding substantial average Dice score gains and notably enhancing robustness in segmenting rare structures, particularly the right heart chambers.
📝 Abstract
Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality.
We address this gap by adapting a TTE-trained, mask-conditioned diffusion backbone to TEE with only a limited number of new cases and adapters as small as $10^5$ parameters. Our pipeline combines Low-Rank Adaptation with MaskR$^2$, a lightweight remapping layer that aligns novel mask formats with the pretrained model's conditioning channels. This design lets users adapt models to new datasets with a different set of anatomical structures to the base model's original set.
Through a targeted adaptation strategy, we find that adapting only MLP layers suffices for high-fidelity TEE synthesis. Finally, mixing less than 200 real TEE frames with our synthetic echoes improves the dice score on a multiclass segmentation task, particularly boosting performance on underrepresented right-heart structures. Our results demonstrate that (1) semantically controlled TEE images can be generated with low overhead, (2) MaskR$^2$ effectively transforms unseen mask formats into compatible formats without damaging downstream task performance, and (3) our method generates images that are effective for improving performance on a downstream task of multiclass segmentation.