🤖 AI Summary
This study addresses the challenge of detecting chromosomal structural abnormalities, a task hindered by the scarcity of real abnormal samples and extreme class imbalance, which severely limits deep learning model performance. To overcome this, the authors propose a simulation-driven structural augmentation framework that generates synthetic abnormalities by perturbing banding patterns of normal chromosomes and employs a diffusion network to restore their structural continuity. An energy score–guided adaptive sampling mechanism is introduced to dynamically select high-fidelity synthetic samples during training, eliminating the need for real abnormal data. Evaluated on a dataset of 260,000 chromosome images, the method achieves state-of-the-art performance, improving average sensitivity, precision, and F1-score by 8.92%, 8.89%, and 13.79%, respectively, and represents the first application of energy distribution–guided dynamic sampling in chromosomal abnormality detection.
📝 Abstract
Detecting structural chromosomal abnormalities is crucial for accurate diagnosis and management of genetic disorders. However, collecting sufficient structural abnormality data is extremely challenging and costly in clinical practice, and not all abnormal types can be readily collected. As a result, deep learning approaches face significant performance degradation due to the severe imbalance and scarcity of abnormal chromosome data. To address this challenge, we propose a Perturb-and-Restore (P&R), a simulation-driven structural augmentation framework that effectively alleviates data imbalance in chromosome anomaly detection. The P&R framework comprises two key components: (1) Structure Perturbation and Restoration Simulation, which generates synthetic abnormal chromosomes by perturbing chromosomal banding patterns of normal chromosomes followed by a restoration diffusion network that reconstructs continuous chromosome content and edges, thus eliminating reliance on rare abnormal samples; and (2) Energy-guided Adaptive Sampling, an energy score-based online selection strategy that dynamically prioritizes high-quality synthetic samples by referencing the energy distribution of real samples. To evaluate our method, we construct a comprehensive structural anomaly dataset consisting of over 260,000 chromosome images, including 4,242 abnormal samples spanning 24 categories. Experimental results demonstrate that the P&R framework achieves state-of-the-art (SOTA) performance, surpassing existing methods with an average improvement of 8.92% in sensitivity, 8.89% in precision, and 13.79% in F1-score across all categories.