🤖 AI Summary
Current general-purpose vision foundation models exhibit low parameter efficiency, weak multi-task performance, and insufficient image-text alignment when applied to mammography, hindering high-accuracy, interpretable breast cancer screening. Method: We propose MammoFoundation—the first domain-specific foundation model for mammography—trained on large-scale, multi-center, native-resolution mammograms using an image-text co-alignment architecture. It unifies four key clinical tasks: cancer diagnosis, lesion localization, structured report generation, and risk prediction. Contribution/Results: On multiple public and private benchmarks, MammoFoundation significantly outperforms generalist large vision models despite using only one-third of their parameters. It achieves superior accuracy while enhancing clinical interpretability and auditability through explicit multimodal alignment and task-integrated representations. This work establishes a new paradigm for AI-powered, trustworthy breast cancer screening.
📝 Abstract
Breast cancer is one of the leading causes of death among women worldwide. We introduce Mammo-FM, the first foundation model specifically for mammography, pretrained on the largest and most diverse dataset to date - 140,677 patients (821,326 mammograms) across four U.S. institutions. Mammo-FM provides a unified foundation for core clinical tasks in breast imaging, including cancer diagnosis, pathology localization, structured report generation, and cancer risk prognosis within a single framework. Its alignment between images and text enables both visual and textual interpretability, improving transparency and clinical auditability, which are essential for real-world adoption. We rigorously evaluate Mammo-FM across diagnosis, prognosis, and report-generation tasks in in- and out-of-distribution datasets. Despite operating on native-resolution mammograms and using only one-third of the parameters of state-of-the-art generalist FMs, Mammo-FM consistently outperforms them across multiple public and private benchmarks. These results highlight the efficiency and value of domain-specific foundation models designed around the full spectrum of tasks within a clinical domain and emphasize the importance of rigorous, domain-aligned evaluation.