๐ค AI Summary
Lesion subtype recognition in breast ultrasound suffers from severe data imbalance due to long-tailed class distributions. To address this, we propose a two-stage adaptive framework: (1) a sketch-guided controllable generative network incorporating anatomical priors to ensure category fidelity of synthesized images; and (2) a reinforcement learningโdriven multi-agent sampler that dynamically optimizes the ratio of real to synthetic samples during training. Additionally, we introduce an unlabeled inference mechanism to enhance generalization. Evaluated on both private long-tailed and public imbalanced breast ultrasound datasets, our method outperforms existing state-of-the-art approaches, achieving an average 4.2% improvement in F1-score. This demonstrates the effectiveness of synergistically combining generative data augmentation with adaptive sampling to mitigate long-tail bias.
๐ Abstract
Accurate identification of breast lesion subtypes can facilitate personalized treatment and interventions. Ultrasound (US), as a safe and accessible imaging modality, is extensively employed in breast abnormality screening and diagnosis. However, the incidence of different subtypes exhibits a skewed long-tailed distribution, posing significant challenges for automated recognition. Generative augmentation provides a promising solution to rectify data distribution. Inspired by this, we propose a dual-phase framework for long-tailed classification that mitigates distributional bias through high-fidelity data synthesis while avoiding overuse that corrupts holistic performance. The framework incorporates a reinforcement learning-driven adaptive sampler, dynamically calibrating synthetic-real data ratios by training a strategic multi-agent to compensate for scarcities of real data while ensuring stable discriminative capability. Furthermore, our class-controllable synthetic network integrates a sketch-grounded perception branch that harnesses anatomical priors to maintain distinctive class features while enabling annotation-free inference. Extensive experiments on an in-house long-tailed and a public imbalanced breast US datasets demonstrate that our method achieves promising performance compared to state-of-the-art approaches. More synthetic images can be found at https://github.com/Stinalalala/Breast-LT-GenAug.