🤖 AI Summary
To address the scarcity of publicly available dental panoramic radiographs (PRs), which hinders AI research and clinical education, this paper proposes PanoDiff: a two-stage generative framework. First, a diffusion model synthesizes low-resolution (128×64) PR “seeds”; second, a novel Transformer-based super-resolution network reconstructs them into high-resolution (1024×512) images, explicitly modeling both local anatomical structures and global spatial relationships. This paradigm balances generation stability and fine-grained fidelity, achieving a Fréchet Inception Distance (FID) of 40.69—significantly outperforming prior methods. In blinded clinical evaluation, experts distinguished real from synthetic PRs with only 68.5% accuracy, confirming high perceptual realism. To our knowledge, this is the first work to synergistically integrate diffusion models with Transformer-based super-resolution for dental image synthesis, establishing a scalable, high-fidelity paradigm for medical image generation.
📝 Abstract
There has been increasing interest in the generation of high-quality, realistic synthetic medical images in recent years. Such synthetic datasets can mitigate the scarcity of public datasets for artificial intelligence research, and can also be used for educational purposes. In this paper, we propose a combination of diffusion-based generation (PanoDiff) and Super-Resolution (SR) for generating synthetic dental panoramic radiographs (PRs). The former generates a low-resolution (LR) seed of a PR (256 X 128) which is then processed by the SR model to yield a high-resolution (HR) PR of size 1024 X 512. For SR, we propose a state-of-the-art transformer that learns local-global relationships, resulting in sharper edges and textures. Experimental results demonstrate a Frechet inception distance score of 40.69 between 7243 real and synthetic images (in HR). Inception scores were 2.55, 2.30, 2.90 and 2.98 for real HR, synthetic HR, real LR and synthetic LR images, respectively. Among a diverse group of six clinical experts, all evaluating a mixture of 100 synthetic and 100 real PRs in a time-limited observation, the average accuracy in distinguishing real from synthetic images was 68.5% (with 50% corresponding to random guessing).