T-SYNTH: A Knowledge-Based Dataset of Synthetic Breast Images

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging algorithm development has long been hindered by the scarcity of large-scale, pixel-level annotated real breast imaging data—particularly paired 2D digital mammography (DM) and 3D digital breast tomosynthesis (DBT) images. To address this, we propose T-SYNTH, the first framework that jointly integrates physics-based imaging models with biologically grounded anatomical priors to generate high-fidelity, semantically accurate synthetic DM-DBT image pairs alongside automatically generated pixel-level segmentation masks. Unlike conventional synthesis methods, T-SYNTH overcomes limitations in anatomical plausibility and cross-modal consistency. It enables the creation of the first open-source, multi-modal synthetic breast imaging dataset with comprehensive annotations. Extensive experiments demonstrate that models trained with T-SYNTH-enhanced data achieve significantly improved generalization performance on real clinical datasets, validating its effectiveness and practical utility as a data augmentation resource.

Technology Category

Application Category

📝 Abstract
One of the key impediments for developing and assessing robust medical imaging algorithms is limited access to large-scale datasets with suitable annotations. Synthetic data generated with plausible physical and biological constraints may address some of these data limitations. We propose the use of physics simulations to generate synthetic images with pixel-level segmentation annotations, which are notoriously difficult to obtain. Specifically, we apply this approach to breast imaging analysis and release T-SYNTH, a large-scale open-source dataset of paired 2D digital mammography (DM) and 3D digital breast tomosynthesis (DBT) images. Our initial experimental results indicate that T-SYNTH images show promise for augmenting limited real patient datasets for detection tasks in DM and DBT. Our data and code are publicly available at https://github.com/DIDSR/tsynth-release.
Problem

Research questions and friction points this paper is trying to address.

Limited access to large-scale annotated medical imaging datasets
Difficulty obtaining pixel-level segmentation annotations in medical images
Need for synthetic data to augment real patient datasets in breast imaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics simulations generate synthetic breast images
Pixel-level segmentation annotations for accuracy
Large-scale open-source dataset DM and DBT
🔎 Similar Papers
No similar papers found.
Christopher Wiedeman
Christopher Wiedeman
PhD Candidate at Rensselaer Polytechnic Institute
Artificial IntelligenceMedical ImagingData Science
A
Anastasiia Sarmakeeva
Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993 USA
Elena Sizikova
Elena Sizikova
FDA/CDRH/OSEL/DIDSR
Medical ImagingArtificial Intelligence (AI)Computer VisionSynthetic DataDigital Twins
Daniil Filienko
Daniil Filienko
PhD student, University of Washington Tacoma
Privacy Preserving Machine LearningLarge Language ModelsHealth AI
M
Miguel Lago
Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993 USA
J
Jana G. Delfino
Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993 USA
Aldo Badano
Aldo Badano
FDA
medical imagingin silico imaging trials