Clinically-guided Data Synthesis for Laryngeal Lesion Detection

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Laryngoscopic image scarcity and insufficient annotation severely limit the generalizability of computer-aided diagnosis and explanation (CADx/e) systems in otolaryngology. To address this, we propose a clinically guided synthetic data generation framework that—novelty—integrates latent diffusion models (LDMs) with ControlNet, while explicitly incorporating anatomical structures and pathological features as clinical priors to synthesize high-fidelity, diverse, and precisely annotated laryngoscopic images. Expert blind evaluation confirms no statistically significant perceptual difference between synthetic and real images (p > 0.05). Incorporating only 10% synthetic data improves internal detection performance by 9% and cross-domain generalization by 22.1%, substantially alleviating the small-sample bottleneck in specialized clinical domains. This work establishes a generalizable, clinically grounded paradigm for trustworthy AI modeling in medical imaging scenarios characterized by data scarcity.

Technology Category

Application Category

📝 Abstract

Although computer-aided diagnosis (CADx) and detection (CADe) systems have made significant progress in various medical domains, their application is still limited in specialized fields such as otorhinolaryngology. In the latter, current assessment methods heavily depend on operator expertise, and the high heterogeneity of lesions complicates diagnosis, with biopsy persisting as the gold standard despite its substantial costs and risks. A critical bottleneck for specialized endoscopic CADx/e systems is the lack of well-annotated datasets with sufficient variability for real-world generalization. This study introduces a novel approach that exploits a Latent Diffusion Model (LDM) coupled with a ControlNet adapter to generate laryngeal endoscopic image-annotation pairs, guided by clinical observations. The method addresses data scarcity by conditioning the diffusion process to produce realistic, high-quality, and clinically relevant image features that capture diverse anatomical conditions. The proposed approach can be leveraged to expand training datasets for CADx/e models, empowering the assessment process in laryngology. Indeed, during a downstream task of detection, the addition of only 10% synthetic data improved the detection rate of laryngeal lesions by 9% when the model was internally tested and 22.1% on out-of-domain external data. Additionally, the realism of the generated images was evaluated by asking 5 expert otorhinolaryngologists with varying expertise to rate their confidence in distinguishing synthetic from real images. This work has the potential to accelerate the development of automated tools for laryngeal disease diagnosis, offering a solution to data scarcity and demonstrating the applicability of synthetic data in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in laryngeal lesion detection

Generating realistic synthetic endoscopic images for training

Improving detection rates with limited annotated datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Diffusion Model for synthetic data generation

ControlNet adapter enhances clinical relevance

Synthetic data boosts lesion detection accuracy

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis