🤖 AI Summary
This study addresses the scarcity of pulmonary ultrasound image data and the tendency of existing generative methods to lose critical diagnostic details—such as B-lines and pleural irregularities—by proposing a novel diffusion-based synthesis framework. The approach uniquely integrates à trous wavelet transforms into medical image generation to circumvent structural information loss typically caused by downsampling, while simultaneously incorporating the BioMedCLIP vision-language model to enable semantic-conditioned control that ensures clinical relevance. Experimental results on pulmonary ultrasound datasets demonstrate that the generated images exhibit lower distortion and higher perceptual quality, preserving anatomical fidelity while significantly enhancing clinical diversity.
📝 Abstract
Lung ultrasound (LUS) is a safe and portable imaging modality, but the scarcity of data limits the development of machine learning methods for image interpretation and disease monitoring. Existing generative augmentation methods, such as Generative Adversarial Networks (GANs) and diffusion models, often lose subtle diagnostic cues due to resolution reduction, particularly B-lines and pleural irregularities. We propose A trous Wavelet Diffusion (AWDiff), a diffusion based augmentation framework that integrates the a trous wavelet transform to preserve fine-scale structures while avoiding destructive downsampling. In addition, semantic conditioning with BioMedCLIP, a vision language foundation model trained on large scale biomedical corpora, enforces alignment with clinically meaningful labels. On a LUS dataset, AWDiff achieved lower distortion and higher perceptual quality compared to existing methods, demonstrating both structural fidelity and clinical diversity.